# 4 Doing Data Analysis with the Multilevel Model for Change

“We now delve deeper into the specification, estimation, and interpretation of the multilevel model for change” (Singer & Willett, 2003, p. 75).

## 4.1 Example: Changes in adolescent alcohol use

Load the data.

```
library(tidyverse)
<- read_csv("data/alcohol1_pp.csv")
alcohol1_pp
head(alcohol1_pp)
```

```
## # A tibble: 6 x 9
## id age coa male age_14 alcuse peer cpeer ccoa
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 14 1 0 0 1.73 1.26 0.247 0.549
## 2 1 15 1 0 1 2 1.26 0.247 0.549
## 3 1 16 1 0 2 2 1.26 0.247 0.549
## 4 2 14 1 1 0 0 0.894 -0.124 0.549
## 5 2 15 1 1 1 0 0.894 -0.124 0.549
## 6 2 16 1 1 2 1 0.894 -0.124 0.549
```

Do note we already have an \((\text{age} - 14)\) variable in the data, `age_14`

.

Here’s our version of Figure 4.1, using `stat_smooth()`

to get the exploratory OLS trajectories.

```
%>%
alcohol1_pp filter(id %in% c(4, 14, 23, 32, 41, 56, 65, 82)) %>%
ggplot(aes(x = age, y = alcuse)) +
stat_smooth(method = "lm", se = F) +
geom_point() +
coord_cartesian(xlim = c(13, 17),
ylim = c(-1, 4)) +
theme(panel.grid = element_blank()) +
facet_wrap(~id, ncol = 4)
```

By this figure, Singer and Willett suggested the simple linear level-1 submodel following the form

\[\begin{align*} \text{alcuse}_{ij} & = \pi_{0i} + \pi_{1i} (\text{age}_{ij} - 14) + \epsilon_{ij}\\ \epsilon_{ij} & \sim \operatorname{Normal}(0, \sigma_\epsilon^2), \end{align*}\]

where \(\pi_{0i}\) is the initial status of participant \(i\), \(\pi_{1i}\) is participant \(i\)’s rate of change, and \(\epsilon_{ij}\) is the variation in participant \(i\)’s data not accounted for in the model.

Singer and Willett made their Figure 4.2 “with a random sample of 32 of the adolescents” (p. 78). If we just wanted a random sample of rows, the `sample_n()`

function would do the job. But since we’re working with long data, we’ll need some `group_by()`

+ `nest()`

mojo. I got the trick from Jenny Bryan’s *Sample from groups, n varies by group*. Setting the seed makes the results from `sample_n()`

reproducible. Here are the top panels.

```
set.seed(4)
%>%
alcohol1_pp group_by(id) %>%
nest() %>%
sample_n(size = 32, replace = T) %>%
unnest(data) %>%
mutate(coa = ifelse(coa == 0, "coa = 0", "coa = 1")) %>%
ggplot(aes(x = age, y = alcuse, group = id)) +
stat_smooth(method = "lm", se = F, size = 1/4) +
coord_cartesian(xlim = c(13, 17),
ylim = c(-1, 4)) +
theme(panel.grid = element_blank()) +
facet_wrap(~coa)
```

We have similar data wrangling needs for the bottom panels.

```
set.seed(4)
%>%
alcohol1_pp group_by(id) %>%
nest() %>%
ungroup() %>%
sample_n(size = 32, replace = T) %>%
unnest(data) %>%
mutate(hp = ifelse(peer < mean(peer), "low peer", "high peer")) %>%
mutate(hp = factor(hp, levels = c("low peer", "high peer"))) %>%
ggplot(aes(x = age, y = alcuse, group = id)) +
stat_smooth(method = "lm", se = F, size = 1/4) +
coord_cartesian(xlim = c(13, 17),
ylim = c(-1, 4)) +
theme(panel.grid = element_blank()) +
facet_wrap(~hp)
```

Based on the exploratory analyses, Singer and Willett posited the initial level-2 submodel might take the form

\[\begin{align*} \pi_{0i} & = \gamma_{00} + \gamma_{01} \text{coa}_i + \zeta_{0i}\\ \pi_{1i} & = \gamma_{10} + \gamma_{11} \text{coa}_i + \zeta_{1i} \\ \begin{bmatrix} \zeta_{0i} \\ \zeta_{1i} \end{bmatrix} & \sim \operatorname{Normal} \begin{pmatrix} \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \begin{bmatrix} \sigma_0^2 & \sigma_{01}\\ \sigma_{01} & \sigma_1^2 \end{bmatrix} \end{pmatrix}, \end{align*}\]

where \(\gamma_{00}\) and \(\gamma_{10}\) are the level-2 intercepts, the population averages when \(\text{coa} = 0\), \(\gamma_{10}\) and \(\gamma_{11}\) are the level-2 slopes expressing the difference when \(\text{coa} = 1\) and \(\zeta_{0i}\) and \(\zeta_{1i}\) are the unexplained variation across the \(\text{id}\)-level intercepts and slopes. Since we’ll be fitting the model with `brms::brm()`

, the \(\Sigma\) matrix will be parameterized in the \(\sigma\) metric. So we might reexpress the model as

\[\begin{align*} \pi_{0i} & = \gamma_{00} + \gamma_{01} \text{coa}_i + \zeta_{0i}\\ \pi_{1i} & = \gamma_{10} + \gamma_{11} \text{coa}_i + \zeta_{1i} \\ \begin{bmatrix} \zeta_{0i} \\ \zeta_{1i} \end{bmatrix} & \sim \operatorname{Normal} \begin{pmatrix} \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \begin{bmatrix} \sigma_0 & \rho_{01}\\ \rho_{01} & \sigma_1 \end{bmatrix} \end{pmatrix}. \end{align*}\]

## 4.2 The composite specification of the multilevel model for change

With a little algebra, we can combine the level-1 and level-2 submodels into the composite multilevel model for change, which follows the form

\[\begin{align*} \text{alcuse}_{ij} & = \big [ \gamma_{00} + \gamma_{10} \text{age_14}_{ij} + \gamma_{01} \text{coa}_i + \gamma_{11} (\text{coa}_i \times \text{age_14}_{ij}) \big ] \\ & \;\;\;\;\; + [ \zeta_{0i} + \zeta_{1i} \text{age_14}_{ij} + \epsilon_{ij} ] \\ \epsilon_{ij} & \sim \operatorname{Normal} (0, \sigma_\epsilon^2) \\ \begin{bmatrix} \zeta_{0i} \\ \zeta_{1i} \end{bmatrix} & \sim \operatorname{Normal} \begin{pmatrix} \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \begin{bmatrix} \sigma_0^2 & \sigma_{01} \\ \sigma_{01} & \sigma_1^2 \end{bmatrix} \end{pmatrix}, \end{align*}\]

where the brackets in the first line partition the structural model (i.e., the model for \(\mu\)) and the stochastic components (i.e., the \(\sigma\) terms). We should note that this is the format that most closely mirrors what we use in the `formula`

argument in `brms::brm()`

. As long as `age`

is not centered on the mean, our **brms** syntax would be: `formula = alcuse ~ 0 + Intercept + age_c + coa + age_c:coa + (1 + age_c | id)`

.

### 4.2.1 The structural component of the composite model.

Although their interpretation is identical, the \(\gamma\)s in the composite model describe patterns of change in a different way. Rather than postulating first how

ALCUSEis related toTIMEand the individual growth parameters, and second how the individual growth parameters are related toCOA, the composite specification in equation 4.3 postulates thatALCUSEdependssimultaneouslyon: (1) the level-1 predictor,TIME; (2) the level-2 predictor,COA; and (3) thecross-levelinteraction,COAbyTIME. From this perspective, the composite model’s structural portion strongly resembles a regular regression model with predictors,TIMEandCOA, appearing as main effects (associated with \(\gamma_{10}\) and \(\gamma_{01}\), respectively) and in across-levelinteraction (associated with \(\gamma_{11}\)). (p. 82,emphasisin the original)

### 4.2.2 The stochastic component of the composite model.

A distinctive feature of the composite multilevel model is its composite residual, the three terms in the second set of brackets on the right of equation 4.3 that combine together the level-1 residual and the two level-2 residuals:

\[\text{Composite residual: } [ \zeta_{0i} + \zeta_{1i} \text{age_14}_{ij} + \epsilon_{ij} ].\] The composite residual is not a simple sum. Instead, the second level-2 residual, \(\zeta_{1i}\), is multiplied by the level-1 predictor, \([\text{age_14}_{ij}]\), before joining its siblings. Despite its unusual construction, the interpretation of the composite residual is straightforward: it describes the difference between the observed and expected value of \([\text{alcuse}]\) for individual \(i\) on occasion \(j\).

The mathematical form of the composite residual reveals two important properties about the occasion-specific residuals not readily apparent in the level-1/level-2 specification: they can be both

autocorrelatedandheteroscedasticwithin person. (p. 84,emphasisin the original)

## 4.3 Methods of estimation, revisited

In this section, the authors introduced generalized least squares (GLS) estimation and iterative generalized least squares (IGLS) estimation and then distinguished between full and restricted maximum likelihood estimation. Since our goal is to fit these models as Bayesians, we won’t be using or discussing any of these in this project. There are, of course, different ways to approach Bayesian estimation. Though we’re using Hamiltonian Monte Carlo, we could use other algorithms, such as the Gibbs sampler. However, all that is outside of the scope of this project.

I suppose the only thing to add is that whereas GLS estimates come from minimizing a weighted function of the residuals and maximum likelihood estimates come from maximizing the log-likelihood function, the results of our Bayesian analyses (i.e., the posterior distribution) come from the consequences of Bayes’ theorem,

\[ p(\theta | d) = \frac{p(d | \theta)\ p(\theta)}{p(d)}. \]

If you really want to dive into the details of this, I suggest referencing a proper introductory Bayesian textbook, such as McElreath (2020a, 2015), Kruschke (2015), or Gelman et al. (2013). I haven’t had time to check it out, but I’ve heard Labmert’s (2018) text is good, too. And for details specific to Stan, and thus **brms**, you might check out the documentation resources at https://mc-stan.org/users/documentation/.

## 4.4 First steps: Fitting two unconditional multilevel models for change

Singer and Willett recommended that before you fit your full theoretical multilevel model of change–the one with all the interesting covariates–you should fit two simpler preliminary models. The first is the *unconditional means model*. The second is the *unconditional growth model*.

I agree. In addition to the reasons they cover in the text, this is just good pragmatic data analysis. Start simple and build up to the more complicated models only after you’re confident you understand what’s going on with the simpler ones. And if you’re new to them, you’ll discover this is especially so with Bayesian methods.

### 4.4.1 The unconditional means model.

The likelihood for the unconditional means model follows the formula

\[\begin{align*} \text{alcuse}_{ij} & = \gamma_{00} + \zeta_{0i} + \epsilon_{ij} \\ \epsilon_{ij} & \sim \operatorname{Normal}(0, \sigma_\epsilon^2) \\ \zeta_{0i} & \sim \operatorname{Normal}(0, \sigma_0^2). \end{align*}\]

Let’s open **brms**.

`library(brms)`

Up till this point, we haven’t focused on priors. It would have been reasonable to wonder if we’d been using them at all. Yes, we have. Even if you don’t specify priors in the `brm()`

function, it’ll compute default weakly-informative priors for you. You might be wondering, *What might these default priors look like?* The `get_prior()`

function let us take a look.

```
get_prior(data = alcohol1_pp,
family = gaussian,
~ 1 + (1 | id)) alcuse
```

```
## prior class coef group resp dpar nlpar bound source
## student_t(3, 1, 2.5) Intercept default
## student_t(3, 0, 2.5) sd default
## student_t(3, 0, 2.5) sd id (vectorized)
## student_t(3, 0, 2.5) sd Intercept id (vectorized)
## student_t(3, 0, 2.5) sigma default
```

For this model, all three priors are based on Student’s \(t\)-distribution. In case you’re rusty, the normal distribution is just a special case of Student’s \(t\)-distribution. Whereas the normal is defined by two parameters (\(\mu\) and \(\sigma\)), the \(t\) distribution is defined by \(\nu\), \(\mu\), and \(\sigma\). In frequentist circles, \(\nu\) is often called the degrees of freedom. More generally, it’s also referred to as a normality parameter. We’ll examine the prior more closely in a bit.

For now, let’s practice setting our priors by manually specifying them within `brm()`

. You do with the `prior`

argument. There are actually several ways to do this. To explore all the options, check out the `set_prior`

section of the **brms** reference manual (Bürkner, 2021d). I typically define my individual priors with the `prior()`

function. When there are more than one priors to define, I typically bind them together within `c(...)`

.

Other than the addition of our fancy `prior`

statement, the rest of the settings within `brm()`

are much like those in prior chapters. Let’s fit the model.

```
.1 <-
fit4brm(data = alcohol1_pp,
family = gaussian,
~ 1 + (1 | id),
alcuse prior = c(prior(student_t(3, 1, 2.5), class = Intercept),
prior(student_t(3, 0, 2.5), class = sd),
prior(student_t(3, 0, 2.5), class = sigma)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 4,
file = "fits/fit04.01")
```

Here are the results.

`print(fit4.1)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: alcuse ~ 1 + (1 | id)
## Data: alcohol1_pp (Number of observations: 246)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Group-Level Effects:
## ~id (Number of levels: 82)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.77 0.08 0.62 0.95 1.00 1371 1912
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.93 0.10 0.74 1.12 1.00 2189 2730
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.75 0.04 0.68 0.84 1.00 3336 3084
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

Compare the results to those listed under “Model A” in Table 4.1. It’s important to keep in mind that **brms** returns ‘sigma’ and ‘sd(Intercept)’ in the standard deviation metric rather than the variance metric. “*But I want them in the variance metric like in the text!*” you say. Okay fine. The best way to do the transformations is after saving the results from `posterior_samples()`

.

```
<- posterior_samples(fit4.1)
post
glimpse(post[, 1:12])
```

```
## Rows: 4,000
## Columns: 12
## $ b_Intercept <dbl> 0.9349322, 1.0902473, 1.0219575, 1.0663572, 1.0948424, 1.0471974, 0.92…
## $ sd_id__Intercept <dbl> 0.9410277, 0.8930986, 0.9624571, 0.9504499, 0.7335754, 0.7998159, 0.69…
## $ sigma <dbl> 0.7580042, 0.7856404, 0.8205403, 0.7976149, 0.7371046, 0.7363369, 0.73…
## $ `r_id[1,Intercept]` <dbl> 0.19640274, 0.82798501, 0.76418594, 0.83805543, 0.48204370, 1.08439530…
## $ `r_id[2,Intercept]` <dbl> -0.73363212, -0.46020853, -0.22635920, -1.08612043, -0.22180192, -0.47…
## $ `r_id[3,Intercept]` <dbl> 0.5971165, 0.7772418, 0.8627380, 0.4136766, 0.9689730, 0.8233887, 0.85…
## $ `r_id[4,Intercept]` <dbl> 0.177266827, 0.024099486, 0.108291616, 0.466467539, -0.076456909, -0.0…
## $ `r_id[5,Intercept]` <dbl> -0.6506294, -1.1014401, -1.5194744, -1.1974478, -0.4586913, -0.6625854…
## $ `r_id[6,Intercept]` <dbl> 1.8737452, 1.3319428, 1.3820165, 1.4637499, 1.4144191, 2.2642202, 1.59…
## $ `r_id[7,Intercept]` <dbl> 0.45311614, 0.65350870, 0.63161138, 0.79237393, 0.09608294, 0.65337681…
## $ `r_id[8,Intercept]` <dbl> -0.70054529, -0.71342199, -0.71191598, -0.32824433, -1.37075472, -0.30…
## $ `r_id[9,Intercept]` <dbl> 0.48923182, 0.47989961, 1.06564969, 0.55648330, 0.05212787, 0.97201593…
```

Since all we’re interested in are the variance components, we’ll `select()`

out the relevant columns from `post`

, compute the squared versions, and save the results in a mini data frame, `v`

.

```
<-
v %>%
post select(sigma, sd_id__Intercept) %>%
mutate(sigma_2_epsilon = sigma^2,
sigma_2_0 = sd_id__Intercept^2)
head(v)
```

```
## sigma sd_id__Intercept sigma_2_epsilon sigma_2_0
## 1 0.7580042 0.9410277 0.5745703 0.8855331
## 2 0.7856404 0.8930986 0.6172308 0.7976251
## 3 0.8205403 0.9624571 0.6732864 0.9263236
## 4 0.7976149 0.9504499 0.6361896 0.9033550
## 5 0.7371046 0.7335754 0.5433233 0.5381329
## 6 0.7363369 0.7998159 0.5421921 0.6397055
```

We can view their distributions like this.

```
%>%
v pivot_longer(everything()) %>%
ggplot(aes(x = value)) +
geom_vline(xintercept = c(.25, .5, .75, 1), color = "white") +
geom_density(size = 0, fill = "black") +
scale_x_continuous(NULL, limits = c(0, 1.25),
breaks = seq(from = 0, to = 1.25, by = .25)) +
scale_y_continuous(NULL, breaks = NULL) +
theme(panel.grid = element_blank()) +
facet_wrap(~name, scales = "free_y")
```

In case it’s hard to follow what just happened, the estimates in the **brms**-default standard-deviation metric are the two panels on the top. Those on the bottom are in the Singer-and-Willett style variance metric. Like we discussed toward the end of last chapter, the variance parameters won’t often be Gaussian. In my experience, they’re typically skewed to the right. There’s nothing wrong with that. This is a recurrent pattern among distributions that are constrained to be zero and above.

If you’re interested, you can summarize those posteriors like so.

```
%>%
v pivot_longer(everything()) %>%
group_by(name) %>%
summarise(mean = mean(value),
median = median(value),
sd = sd(value),
ll = quantile(value, prob = .025),
ul = quantile(value, prob = .975)) %>%
# this last bit just rounds the output
mutate_if(is.double, round, digits = 3)
```

```
## # A tibble: 4 x 6
## name mean median sd ll ul
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 sd_id__Intercept 0.77 0.765 0.084 0.617 0.946
## 2 sigma 0.755 0.752 0.042 0.678 0.844
## 3 sigma_2_0 0.599 0.586 0.131 0.381 0.894
## 4 sigma_2_epsilon 0.571 0.565 0.065 0.459 0.713
```

For this model, our posterior medians are closer to the estimates in the text (Table 4.1) than the means. However, our posterior standard deviations are pretty close to the standard errors in the text.

One of the advantages of our Bayesian method is that when we compute something like the intraclass correlation coefficient \(\rho\), we get an entire distribution for the parameter rather than a measly point estimates. This is always the case with Bayes. The algebraic transformations of the posterior distribution are themselves distributions. Before we compute \(\rho\), do pay close attention to the formula,

\[ \rho = \frac{\sigma_0^2}{\sigma_0^2 + \sigma_\epsilon^2}. \]

Even though our **brms** output yields the variance parameters in the standard-deviation metric, the formula for \(\rho\) demands we use variances. That’s nothing a little squaring can’t fix. Here’s what our \(\rho\) looks like.

```
%>%
v transmute(rho = sd_id__Intercept^2 / (sd_id__Intercept^2 + sigma^2)) %>%
ggplot(aes(x = rho)) +
geom_density(size = 0, fill = "black") +
scale_x_continuous(expression(rho), limits = 0:1) +
scale_y_continuous(NULL, breaks = NULL) +
theme(panel.grid = element_blank())
```

Though the posterior for \(\rho\) is indeed centered around .5, look at how wide and uncertain that distribution is. The bulk of the posterior mass takes up almost half of the parameter space. If you wanted the summary statistics, you might do what we did for the variance parameters, above.

```
%>%
v transmute(rho = sd_id__Intercept^2 / (sd_id__Intercept^2 + sigma^2)) %>%
summarise(mean = mean(rho),
median = median(rho),
sd = sd(rho),
ll = quantile(rho, prob = .025),
ul = quantile(rho, prob = .975)) %>%
mutate_if(is.double, round, digits = 3)
```

```
## mean median sd ll ul
## 1 0.508 0.51 0.064 0.375 0.628
```

Concerning \(\rho\), Singer and Willett pointed out

it summarizes the size of the residual autocorrelation in the composite unconditional means mode….

Each person has a different composite residual on each occasion of measurement. But notice the difference in the subscripts of the pieces of the composite residual: while the level-1 residual, \(\epsilon_{ij}\) has two subscripts (\(i\) and \(j\)), the level-2 residual, \(\zeta_{0i}\), has only one (\(i\)). Each person can have a different \(\epsilon_{ij}\) on each occasion, but has only one \(\zeta_{0i}\) across every occasion. The repeated presence of \(\zeta_{0i}\) in individual \(i\)’s composite residual links his or her composite residuals across occasions. The error autocorrelation coefficient quantifies the magnitude of this linkage; in the unconditional means model, the error autocorrelation coefficient

isthe intraclass correlation coefficient. Thus, we estimate that, for each person, the average correlation between any pair of composite residuals–between occasions 1 and 2, or 2 and 3, or 1 and 3–is [.5]. (pp. 96–97,emphasisin the original)

Because of the differences in how they’re estimated with and presented by `brm()`

, we focused right on the variance components. But before we move on to the next section, we should back up a bit. On page 93, Singer and Willett discussed their estimate for \(\gamma_{00}\). Here’s ours.

`fixef(fit4.1)`

```
## Estimate Est.Error Q2.5 Q97.5
## Intercept 0.9278335 0.09723568 0.7368138 1.117278
```

They talked about how squaring that value puts it back to the natural metric the data were originally collected in. [Recall that as discussed earlier in the text the `alcuse`

variable was square-root transformed because of excessive skew.] If you want a quick and dirty look, you can square our results, too.

`fixef(fit4.1)^2 `

```
## Estimate Est.Error Q2.5 Q97.5
## Intercept 0.860875 0.009454777 0.5428946 1.24831
```

However, I do not recommend this method. Though it did okay at transforming the posterior mean (i.e., Estimate), it’s not a great way to get the summary statistics correct. To do that, you’ll need to work with the posterior samples themselves. Remember how we saved them as `post`

? Let’s refresh ourselves and look at the first few columns.

`1:6, 1:3] post[`

```
## b_Intercept sd_id__Intercept sigma
## 1 0.9349322 0.9410277 0.7580042
## 2 1.0902473 0.8930986 0.7856404
## 3 1.0219575 0.9624571 0.8205403
## 4 1.0663572 0.9504499 0.7976149
## 5 1.0948424 0.7335754 0.7371046
## 6 1.0471974 0.7998159 0.7363369
```

See that `b_Intercept`

column there? That contains our posterior draws from \(\gamma_{00}\). If you want proper summary statistics from the transformed estimate, get them after transforming that column.

```
%>%
post transmute(gamma_00_squared = b_Intercept^2) %>%
summarise(mean = mean(gamma_00_squared),
median = median(gamma_00_squared),
sd = sd(gamma_00_squared),
ll = quantile(gamma_00_squared, prob = .025),
ul = quantile(gamma_00_squared, prob = .975)) %>%
mutate_if(is.double, round, digits = 3) %>%
pivot_longer(everything())
```

```
## # A tibble: 5 x 2
## name value
## <chr> <dbl>
## 1 mean 0.87
## 2 median 0.862
## 3 sd 0.181
## 4 ll 0.543
## 5 ul 1.25
```

And one last bit before we move on to the next section. Remember how we discovered what the `brm()`

default priors were for our model with the handy `get_prior()`

function? Let’s refresh ourselves on how that worked.

```
get_prior(data = alcohol1_pp,
family = gaussian,
~ 1 + (1 | id)) alcuse
```

```
## prior class coef group resp dpar nlpar bound source
## student_t(3, 1, 2.5) Intercept default
## student_t(3, 0, 2.5) sd default
## student_t(3, 0, 2.5) sd id (vectorized)
## student_t(3, 0, 2.5) sd Intercept id (vectorized)
## student_t(3, 0, 2.5) sigma default
```

We inserted the data and the model and `get_prior()`

returned the default priors. Especially for new Bayesians, or even for experienced Bayesians working with unfamiliar models, it can be handy to plot your priors to get a sense of them.

Base **R** has an array of functions based on the \(t\) distribution (e.g., `rt()`

, `dt()`

). These functions are limited in that while they allow users to select the desired \(\nu\) values (i.e., degrees of freedom), they fix \(\mu = 0\) and \(\sigma = 1\). If you want to stick with the base **R** functions, you can find tricky ways around this. To avoid overwhelming anyone new to Bayes or the multilevel model or **R** or some exasperating combination, let’s just make things simpler and use a different function. As it turns out, the **metRology** package (Ellison., 2018) contains a `dt.scaled()`

function that allows users to define all three parameters for Student’s \(t\).

We’ll start with the default intercept prior, \(t(\nu = 3, \mu = 1, \sigma = 2.5)\). Here’s the density in the range \([-20, 20]\).

```
tibble(x = seq(from = -20, to = 20, length.out = 1e3)) %>%
mutate(density = metRology::dt.scaled(x, df = 3, mean = 1, sd = 2.5)) %>%
ggplot(aes(x = x, y = density)) +
geom_vline(xintercept = 1, color = "white") +
geom_line() +
labs(title = expression(paste("prior for ", gamma[0][0])),
x = "parameter space") +
theme(panel.grid = element_blank())
```

Though it’s centered on 1, the bulk of the mass seems to range from -10 to 10. Given the model estimate ended up about 0.9, it looks like that was a pretty broad and minimally-informative prior. However, the prior isn’t flat and it does help guard against wasting time and HMC iterations sampling from ridiculous regions of the parameter space such as -10,000 or +500,000,000. No adolescent is drinking that much (or that little–how does one drink a negative value?).

Here’s the shape of the variance priors.

```
tibble(x = seq(from = 0, to = 20, length.out = 1e3)) %>%
mutate(density = metRology::dt.scaled(x, df = 3, mean = 0, sd = 2.5)) %>%
ggplot(aes(x = x, y = density)) +
geom_vline(xintercept = 0, color = "white") +
geom_line() +
labs(title = expression(paste("prior for both ", sigma[0], " and ", sigma[epsilon])),
x = "parameter space") +
theme(panel.grid = element_blank())
```

Recall that by **brms** default, the variance parameters have a lower-limit of 0. So specifying a Student’s \(t\) or other Gaussian-like prior on them ends up cutting the distribution off at 0. Given that our estimates were both below 1, it appears that these priors were minimally informative. But again, they did help prevent `brm()`

from sampling from negative values or from obscenely-large values.

*These priors look kinda silly*, you might say. *Anyone with a little common sense can do better*. Well, sure. Probably. Maybe. But keep in mind we’re still getting the layout of the land. And plus, this was a pretty simple model. Selecting high-quality priors gets tricky as the models get more complicated. In other chapters, we’ll explore other ways to specify priors for our multilevel models. But to keep things simple for now, let’s keep practicing inspecting and using the defaults with `get_prior()`

and so on.

### 4.4.2 The unconditional growth model.

Using the composite formula, our next model, the unconditional growth model, follows the form

\[\begin{align*} \text{alcuse}_{ij} & = \gamma_{00} + \gamma_{10} \text{age_14}_{ij} + \zeta_{0i} + \zeta_{1i} \text{age_14}_{ij} + \epsilon_{ij} \\ \epsilon_{ij} & \sim \operatorname{Normal} (0, \sigma_\epsilon^2) \\ \begin{bmatrix} \zeta_{0i} \\ \zeta_{1i} \end{bmatrix} & \sim \operatorname{Normal} \begin{pmatrix} \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \begin{bmatrix} \sigma_0^2 & \sigma_{01} \\ \sigma_{01} & \sigma_1^2 \end{bmatrix} \end{pmatrix}. \end{align*}\]

With it, we now have a full composite stochastic model. Let’s query the `brms::brm()`

default priors when we apply this model to our data.

```
get_prior(data = alcohol1_pp,
family = gaussian,
~ 0 + Intercept + age_14 + (1 + age_14 | id)) alcuse
```

```
## prior class coef group resp dpar nlpar bound source
## (flat) b default
## (flat) b age_14 (vectorized)
## (flat) b Intercept (vectorized)
## lkj(1) cor default
## lkj(1) cor id (vectorized)
## student_t(3, 0, 2.5) sd default
## student_t(3, 0, 2.5) sd id (vectorized)
## student_t(3, 0, 2.5) sd age_14 id (vectorized)
## student_t(3, 0, 2.5) sd Intercept id (vectorized)
## student_t(3, 0, 2.5) sigma default
```

Several things of note: First, notice how we continue to use the `student_t(3, 0, 2.5)`

for all three of our standard-deviation-metric variance parameters. Since we’re now estimating \(\sigma_0\) and \(\sigma_1\), which themselves have a correlation, \(\rho_{01}\), we have a prior of `class = cor`

. I’m going to put off what is meant by the name `lkj`

, but for the moment just realize that this prior is essentially noninformative within this context.

There’s a major odd development with this output. Notice how the `prior`

column is `(flat)`

for the rows for our two coefficients of class `b`

. And if you’re a little confused, recall that because our predictor `age_14`

is not mean-centered, we’ve used the `0 + Intercept`

syntax, which switches the model intercept parameter to the class of `b`

. From the `set_prior`

section of the reference manual for **brms** version 2.12.0, we read: “The default prior for population-level effects (including monotonic and category specific effects) is an improper flat prior over the reals” (p. 179). At present, these priors are uniform across the entire parameter space. They’re not just weak, their entirely noninformative. That is, the likelihood dominates the posterior for those parameters.

Here’s how to fit the model with these priors.

```
.2 <-
fit4brm(data = alcohol1_pp,
family = gaussian,
~ 0 + Intercept + age_14 + (1 + age_14 | id),
alcuse prior = c(prior(student_t(3, 0, 2.5), class = sd),
prior(student_t(3, 0, 2.5), class = sigma),
prior(lkj(1), class = cor)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 4,
file = "fits/fit04.02")
```

How did we do?

`print(fit4.2, digits = 3)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: alcuse ~ 0 + Intercept + age_14 + (1 + age_14 | id)
## Data: alcohol1_pp (Number of observations: 246)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Group-Level Effects:
## ~id (Number of levels: 82)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.789 0.103 0.590 0.994 1.006 662 1341
## sd(age_14) 0.358 0.102 0.110 0.530 1.016 323 444
## cor(Intercept,age_14) -0.099 0.283 -0.505 0.664 1.007 489 526
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.648 0.107 0.433 0.868 1.001 1547 2510
## age_14 0.271 0.063 0.144 0.392 1.001 3160 2924
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.608 0.054 0.511 0.722 1.017 384 817
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

If your compare our results with those in the “Model B” column in Table 4.1, you’ll see our summary results match well with those in the text. Our \(\gamma\)s (i.e., ‘Population-Level Effects:’) are near identical. The leftmost panel in Figure 4.3 shows the prototypical trajectory, based on the \(\gamma\)s. A quick way to get that within our **brms** framework is with the `conditional_effects()`

function. Here’s the default output.

`conditional_effects(fit4.2)`

Staying with `conditional_effects()`

allows users some flexibility for customizing the plot(s). For example, the default behavior is to depict the trajectory in terms of its 95% intervals and posterior median. If you’d prefer the 80% intervals and the posterior mean, customize it like so.

```
conditional_effects(fit4.2,
robust = F,
prob = .8)
```

We’ll explore more options with `brms::conditional_effects()`

with Model C. For now, let’s turn our focus on the stochastic elements in the model. Here we extract the posterior samples and do the conversions to see how they compare with Singer and Willett’s.

```
<- posterior_samples(fit4.2)
post
<-
v %>%
post transmute(sigma_2_epsilon = sigma^2,
sigma_2_0 = sd_id__Intercept^2,
sigma_2_1 = sd_id__age_14^2,
sigma_01 = sd_id__Intercept * cor_id__Intercept__age_14 * sd_id__age_14)
head(v)
```

```
## sigma_2_epsilon sigma_2_0 sigma_2_1 sigma_01
## 1 0.3433511 0.6012546 0.24841245 -0.05128060
## 2 0.2694387 0.5909976 0.14888499 -0.08606933
## 3 0.4132530 0.4624629 0.12995714 -0.01888894
## 4 0.3911052 0.8243339 0.12940449 -0.10634270
## 5 0.4167656 0.7979866 0.05441465 -0.07212476
## 6 0.3804809 0.5958712 0.05215451 0.00962985
```

This time, our `v`

object only contains the stochastic components in the variance metric. Let’s plot.

```
%>%
v pivot_longer(everything()) %>%
ggplot(aes(x = value)) +
geom_density(size = 0, fill = "black") +
scale_y_continuous(NULL, breaks = NULL) +
theme(panel.grid = element_blank()) +
facet_wrap(~name, scales = "free")
```

For each, their posterior mass is centered near the point estimates Singer and Willet reported in the text. Here are the summary statistics.

```
%>%
v pivot_longer(everything()) %>%
group_by(name) %>%
summarise(mean = mean(value),
median = median(value),
sd = sd(value),
ll = quantile(value, prob = .025),
ul = quantile(value, prob = .975)) %>%
mutate_if(is.double, round, digits = 3)
```

```
## # A tibble: 4 x 6
## name mean median sd ll ul
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 sigma_01 -0.049 -0.044 0.077 -0.224 0.082
## 2 sigma_2_0 0.632 0.616 0.165 0.348 0.987
## 3 sigma_2_1 0.138 0.135 0.069 0.012 0.28
## 4 sigma_2_epsilon 0.372 0.363 0.067 0.262 0.521
```

Happily, they’re quite comparable to those in the text.

We’ve been pulling the posterior samples for all parameters with `posterior_samples()`

and subsetting to a few variables of interest, such as the variance parameters. But it our primary interest is just the iterations for the variance parameters, we can extract them in a more focused way with the `VarCorr()`

function. Here’s how we’d do so for `fit4.2`

.

```
VarCorr(fit4.2, summary = F) %>%
str()
```

```
## List of 2
## $ id :List of 3
## ..$ sd : num [1:4000, 1:2] 0.775 0.769 0.68 0.908 0.893 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ iterations: NULL
## .. .. ..$ parameters: chr [1:2] "Intercept" "age_14"
## ..$ cor: num [1:4000, 1:2, 1:2] 1 1 1 1 1 1 1 1 1 1 ...
## .. ..- attr(*, "dimnames")=List of 3
## .. .. ..$ : NULL
## .. .. ..$ : chr [1:2] "Intercept" "age_14"
## .. .. ..$ : chr [1:2] "Intercept" "age_14"
## ..$ cov: num [1:4000, 1:2, 1:2] 0.601 0.591 0.462 0.824 0.798 ...
## .. ..- attr(*, "dimnames")=List of 3
## .. .. ..$ : NULL
## .. .. ..$ : chr [1:2] "Intercept" "age_14"
## .. .. ..$ : chr [1:2] "Intercept" "age_14"
## $ residual__:List of 1
## ..$ sd: num [1:4000, 1] 0.586 0.519 0.643 0.625 0.646 ...
## .. ..- attr(*, "dimnames")=List of 2
## .. .. ..$ iterations: NULL
## .. .. ..$ parameters: chr ""
```

That last part, the contents of the second higher-level list indexed by `$ residual`

, contains the contents for \(\sigma_\epsilon\). On page 100 in the text, Singer and Willett compared \(\sigma_\epsilon^2\) from the first model to that from the second. We might do that like so.

```
cbind(VarCorr(fit4.1, summary = F)[[2]][[1]],
VarCorr(fit4.2, summary = F)[[2]][[1]]) %>%
data.frame() %>%
mutate_all(~.^2) %>%
set_names(str_c("fit4.", 1:2)) %>%
mutate(`fit4.1 - fit4.2` = fit4.1 - fit4.2) %>%
pivot_longer(everything()) %>%
mutate(name = factor(name, levels = c("fit4.1", "fit4.2", "fit4.1 - fit4.2"))) %>%
ggplot(aes(x = value)) +
geom_vline(xintercept = .5, color = "white") +
geom_density(fill = "grey25", color = "transparent") +
scale_y_continuous(NULL, breaks = NULL) +
xlab(expression(sigma[epsilon]^2)) +
theme(panel.grid = element_blank()) +
facet_wrap(~name, scales = "free_y", ncol = 3)
```

To compute a formal summary of the decline in \(\sigma_\epsilon^2\) after adding time to the model, we might summarize like before.

```
cbind(VarCorr(fit4.1, summary = F)[[2]][[1]],
VarCorr(fit4.2, summary = F)[[2]][[1]]) %>%
data.frame() %>%
mutate_all(~.^2) %>%
set_names(str_c("fit4.", 1:2)) %>%
mutate(proportion_decline = (fit4.1 - fit4.2) / fit4.1) %>%
summarise(mean = mean(proportion_decline),
median = median(proportion_decline),
sd = sd(proportion_decline),
ll = quantile(proportion_decline, prob = .025),
ul = quantile(proportion_decline, prob = .975)) %>%
mutate_if(is.double, round, digits = 3)
```

```
## mean median sd ll ul
## 1 0.341 0.357 0.14 0.017 0.564
```

In case it wasn’t clear, when we presented `fit4.1 – fit4.2`

in the density plot, that was a simple difference score. However, we computed `proportion_decline`

above by dividing that difference score by `fit4.1`

; that’s what put the difference in a proportion metric. Anyway, Singer and Willett’s method led them to summarize the decline as .40. Our method was a more conservative .34-ish. And very happily, our method allows us to describe the proportion decline with summary statistics for the full posterior, such as with the \(SD\) and the 95% intervals.

```
%>%
post ggplot(aes(x = cor_id__Intercept__age_14)) +
geom_vline(xintercept = 0, color = "white") +
geom_density(fill = "grey25", color = "transparent") +
scale_x_continuous(expression(rho[0][1]), limits = c(-1, 1)) +
scale_y_continuous(NULL, breaks = NULL) +
theme(panel.grid = element_blank())
```

The estimate Singer and Willett hand-computed in the text, -.22, is near the mean of our posterior distribution for \(\rho_{01}\). However, our distribution provides a full expression of the uncertainty in the parameter. As are many other values within the parameter space, zero is indeed a credible value for \(\rho_{01}\).

On page 101, we get the generic formula for computing the residual variance for a given occasion \(j\),

\[ \sigma_{\text{Residual}_j}^2 = \sigma_0^2 + \sigma_1^2 \text{time}_j + 2 \sigma_{01} \text{time}_j + \sigma_\epsilon^2. \]

If we were just interested in applying it to one of our `age`

values, say 14, we might apply the formula to the posterior like this.

```
%>%
post transmute(sigma_2_residual_j = sd_id__Intercept^2 +
^2 * 0 +
sd_id__age_142 * sd_id__Intercept * cor_id__Intercept__age_14 * sd_id__age_14 * 0 +
^2) %>%
sigmahead()
```

```
## sigma_2_residual_j
## 1 0.9446057
## 2 0.8604362
## 3 0.8757159
## 4 1.2154391
## 5 1.2147522
## 6 0.9763522
```

But given we’d like to do so over several values of `age`

, it might be better to wrap the equation in a custom function. Let’s call it `make_s2rj()`

.

```
<- function(x) {
make_s2rj %>%
post transmute(sigma_2_residual_j = sd_id__Intercept^2 + sd_id__age_14^2 * x + 2 * sd_id__Intercept * cor_id__Intercept__age_14 * sd_id__age_14 * x + sigma^2) %>%
pull()
}
```

Now we can put our custom `make_s2rj()`

function to work within the `purrr::map()`

paradigm. We’ll plot the results.

```
tibble(age = 14:16) %>%
mutate(age_c = age - 14) %>%
mutate(s2rj = map(age_c, make_s2rj)) %>%
unnest(s2rj) %>%
mutate(label = str_c("age = ", age)) %>%
ggplot(aes(x = s2rj)) +
geom_density(fill = "grey25", color = "transparent") +
# just for reference
geom_vline(xintercept = 1, color = "grey92", linetype = 2) +
scale_y_continuous(NULL, breaks = NULL) +
labs(title = "Behold the shape of longitudinal heteroscedasticity.",
x = expression(sigma[italic(Residual[j])]^2)) +
theme(panel.grid = element_blank()) +
facet_wrap(~label, scales = "free_y", ncol = 1)
```

We see a subtle increase over time, particularly from `age = 15`

to `age = 16`

. Yep, that’s heteroscedasticity. It is indeed “beyond the bland homoscedasticity we assume of residuals in cross-sectional data” (p. 101).

We might also be interested in computing the autocorrelation between the composite residuals on occasions \(j\) and \(j'\), which follows the formula

\[ \rho_{\text{Residual}_j, \text{Residual}_{j'}} = \frac{\sigma_0^2 + \sigma_{01} (\text{time}_j + \text{time}_{j'}) + \sigma_1^2 \text{time}_j \text{time}_{j'}} {\sqrt{\sigma_{\text{Residual}_j}^2 \sigma_{\text{Residual}_{j'}}^2 }}. \]

We only want to do that by hand once. Let’s make a custom function following the formula.

```
<- function(j, jp) {
make_rho_rj_rjp
# define the elements in the denominator
<- make_s2rj(j)
s2rj_j <- make_s2rj(jp)
s2rj_jp
# compute
%>%
post transmute(r = (sd_id__Intercept^2 +
* cor_id__Intercept__age_14 * sd_id__age_14 * (j + jp) +
sd_id__Intercept ^2 * j * jp) /
sd_id__age_14sqrt(s2rj_j * s2rj_jp)) %>%
pull()
}
```

If you only cared about measures of central tendency, such as the posterior median, you could use the funciton like this.

`make_rho_rj_rjp(0, 1) %>% median()`

`## [1] 0.5686782`

`make_rho_rj_rjp(1, 2) %>% median()`

`## [1] 0.7204235`

`make_rho_rj_rjp(0, 2) %>% median()`

`## [1] 0.5131335`

Here are the full posteriors.

```
tibble(occasion = 1:3) %>%
mutate(age_c = occasion - 1,
j = c(1, 2, 1) - 1,
jp = c(2, 3, 3) - 1) %>%
mutate(r = map2(j, jp, make_rho_rj_rjp)) %>%
unnest(r) %>%
mutate(label = str_c("occasions ", j + 1, " and ", jp + 1)) %>%
ggplot(aes(x = r)) +
# just for reference
geom_vline(xintercept = c(.5, .75), color = "white") +
geom_density(fill = "grey25", color = "transparent") +
scale_x_continuous(expression(rho[Residual[italic(j)]][Residual[italic(j*minute)]]), limits = 0:1) +
scale_y_continuous(NULL, breaks = NULL) +
ggtitle("Behold the shapes of our autocorrelations!") +
theme(panel.grid = element_blank()) +
facet_wrap(~label, scales = "free_y", ncol = 1)
```

`## Warning: Removed 1 rows containing non-finite values (stat_density).`

### 4.4.3 Quantifying the proportion of outcome variation “explained.”

Because of the way the multilevel model partitions off variance into different sources (e.g., \(\sigma_0^2\), \(\sigma_1^2\), and \(\sigma_\epsilon^2\) in the unconditional growth model), the conventional \(R^2\) is not applicable for evaluating models in the traditional OLS sense of percent of variance explained. Several pseudo \(R^2\) statistics are frequently used instead. Be warned, “statisticians have yet to agree on appropriate summaries (I. G. Kreft & de Leeuw, 1998; Snijders & Bosker, 1994)” (p. 102). See also Jaeger et al. (2017), Jason D. Rights & Cole (2018), and Jason D. Rights & Sterba (2020). To my eye, none of the solutions presented in this section are magic bullets.

#### 4.4.3.1 An overall summary of total outcome variability explained.

In multiple regression, one simple way of computing a summary \(R^2\) statistic is to square the sample correlation between observed and predicted values of the outcome. The same approach can be used in the multilevel model for change. All you need to do is: (1) compute the predicted outcome value for each person on each occasion of measurement; and (2) square the sample correlation between observed and predicted values. The resultant pseudo-\(R^2\) statistic assesses the proportion of total outcome variation “explained” by the

multilevel model’s specific contribution of predictors. (p. 102,emphasisadded)

Singer and Willett called this \(R_{y, \hat y}^2\). They then walked through an example with their Model B (`fit4.2`

), the unconditional growth model. Within our **brms** paradigm, we typically use the `fitted()`

function to return predicted outcome values for cases within the data. The default option for the `fitted()`

function is to return these predictions after accounting for the level-2 clustering. As we will see, Singer and Willett’s \(R_{y, \hat y}^2\) statistic only accounts for predictors (i.e., `age_14`

, in this case), not clustering variables (i.e., `id`

, in this case). To follow Singer and Willett’s specification, we need to set `re_formula = NA`

, which will instruct `fitted()`

to return the expected values without reference to the level-2 clustering. Here’s a look at the first six rows of that output.

```
fitted(fit4.2, re_formula = NA) %>%
head()
```

```
## Estimate Est.Error Q2.5 Q97.5
## [1,] 0.6477610 0.10745878 0.4332701 0.8678844
## [2,] 0.9192549 0.09816102 0.7181607 1.1161539
## [3,] 1.1907487 0.12474947 0.9388241 1.4320267
## [4,] 0.6477610 0.10745878 0.4332701 0.8678844
## [5,] 0.9192549 0.09816102 0.7181607 1.1161539
## [6,] 1.1907487 0.12474947 0.9388241 1.4320267
```

Within our Bayesian/**brms** paradigm, out expected values come with expressions of uncertainty in terms of the posterior standard deviation and percentile-based 95% intervals. If we followed Singer and Willett’s method in the text, we’d only work with the posterior means as presented within the `Estimate`

column. But since we’re Bayesians, we should attempt to work with the model uncertainty. One approach is to set `summary = F`

.

```
<-
f fitted(fit4.2,
summary = F,
re_formula = NA) %>%
as_tibble() %>%
set_names(1:ncol(.)) %>%
rownames_to_column("iter")
head(f)
```

```
## # A tibble: 6 x 247
## iter `1` `2` `3` `4` `5` `6` `7` `8` `9` `10` `11` `12` `13` `14` `15`
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 0.739 0.968 1.20 0.739 0.968 1.20 0.739 0.968 1.20 0.739 0.968 1.20 0.739 0.968 1.20
## 2 2 0.610 0.922 1.23 0.610 0.922 1.23 0.610 0.922 1.23 0.610 0.922 1.23 0.610 0.922 1.23
## 3 3 0.515 0.829 1.14 0.515 0.829 1.14 0.515 0.829 1.14 0.515 0.829 1.14 0.515 0.829 1.14
## 4 4 0.611 0.867 1.12 0.611 0.867 1.12 0.611 0.867 1.12 0.611 0.867 1.12 0.611 0.867 1.12
## 5 5 0.735 1.06 1.38 0.735 1.06 1.38 0.735 1.06 1.38 0.735 1.06 1.38 0.735 1.06 1.38
## 6 6 0.642 0.861 1.08 0.642 0.861 1.08 0.642 0.861 1.08 0.642 0.861 1.08 0.642 0.861 1.08
## # … with 231 more variables: 16 <dbl>, 17 <dbl>, 18 <dbl>, 19 <dbl>, 20 <dbl>, 21 <dbl>, 22 <dbl>,
## # 23 <dbl>, 24 <dbl>, 25 <dbl>, 26 <dbl>, 27 <dbl>, 28 <dbl>, 29 <dbl>, 30 <dbl>, 31 <dbl>,
## # 32 <dbl>, 33 <dbl>, 34 <dbl>, 35 <dbl>, 36 <dbl>, 37 <dbl>, 38 <dbl>, 39 <dbl>, 40 <dbl>,
## # 41 <dbl>, 42 <dbl>, 43 <dbl>, 44 <dbl>, 45 <dbl>, 46 <dbl>, 47 <dbl>, 48 <dbl>, 49 <dbl>,
## # 50 <dbl>, 51 <dbl>, 52 <dbl>, 53 <dbl>, 54 <dbl>, 55 <dbl>, 56 <dbl>, 57 <dbl>, 58 <dbl>,
## # 59 <dbl>, 60 <dbl>, 61 <dbl>, 62 <dbl>, 63 <dbl>, 64 <dbl>, 65 <dbl>, 66 <dbl>, 67 <dbl>,
## # 68 <dbl>, 69 <dbl>, 70 <dbl>, 71 <dbl>, 72 <dbl>, 73 <dbl>, 74 <dbl>, 75 <dbl>, 76 <dbl>,
## # 77 <dbl>, 78 <dbl>, 79 <dbl>, 80 <dbl>, 81 <dbl>, 82 <dbl>, 83 <dbl>, 84 <dbl>, 85 <dbl>,
## # 86 <dbl>, 87 <dbl>, 88 <dbl>, 89 <dbl>, 90 <dbl>, 91 <dbl>, 92 <dbl>, 93 <dbl>, 94 <dbl>,
## # 95 <dbl>, 96 <dbl>, 97 <dbl>, 98 <dbl>, 99 <dbl>, 100 <dbl>, 101 <dbl>, 102 <dbl>, 103 <dbl>,
## # 104 <dbl>, 105 <dbl>, 106 <dbl>, 107 <dbl>, 108 <dbl>, 109 <dbl>, 110 <dbl>, 111 <dbl>,
## # 112 <dbl>, 113 <dbl>, 114 <dbl>, 115 <dbl>, …
```

With those settings, `fitted()`

returned a \(4,000 \times 246\) numeric array. The 4,000 rows corresponded to the 4,000 post-warmup HMC draws. Each of the 246 columns corresponded to one of the 246 rows in the original `alcohol1_pp`

data. To make the output more useful, we converted it to a data frame, named the columns by the row numbers corresponding to the original `alcohol1_pp`

data, and converted the row names to an `iter`

column.

In the next code block, we’ll convert `f`

to the long format and use `left_join()`

to join it with the relevant subset of the `alcohol1_pp`

data.

```
<-
f %>%
f pivot_longer(-iter,
names_to = "row",
values_to = "fitted") %>%
mutate(row = row %>% as.integer()) %>%
left_join(
%>%
alcohol1_pp mutate(row = 1:n()) %>%
select(row, alcuse),
by = "row"
)
f
```

```
## # A tibble: 984,000 x 4
## iter row fitted alcuse
## <chr> <int> <dbl> <dbl>
## 1 1 1 0.739 1.73
## 2 1 2 0.968 2
## 3 1 3 1.20 2
## 4 1 4 0.739 0
## 5 1 5 0.968 0
## 6 1 6 1.20 1
## 7 1 7 0.739 1
## 8 1 8 0.968 2
## 9 1 9 1.20 3.32
## 10 1 10 0.739 0
## # … with 983,990 more rows
```

If we collapse the distinction across the 4,000 HMC draws, here is the squared correlation between `fitted`

and `alcuse`

.

```
%>%
f summarise(r = cor(fitted, alcuse),
r2 = cor(fitted, alcuse)^2)
```

```
## # A tibble: 1 x 2
## r r2
## <dbl> <dbl>
## 1 0.186 0.0347
```

This is close to the \(R_{y, \hat y}^2 = .043\) Singer and Willett reported in the text. It might seem unsatisfying how this seemingly ignores model uncertainty by collapsing across HMC iterations. Here’s a look at what happens is we compute the \(R_{y, \hat y}^2\) separately for each iteration.

```
%>%
f mutate(iter = iter %>% as.double()) %>%
group_by(iter) %>%
summarise(r = cor(fitted, alcuse),
r2 = cor(fitted, alcuse)^2)
```

```
## # A tibble: 4,000 x 3
## iter r r2
## <dbl> <dbl> <dbl>
## 1 1 0.208 0.0434
## 2 2 0.208 0.0434
## 3 3 0.208 0.0434
## 4 4 0.208 0.0434
## 5 5 0.208 0.0434
## 6 6 0.208 0.0434
## 7 7 0.208 0.0434
## 8 8 0.208 0.0434
## 9 9 0.208 0.0434
## 10 10 0.208 0.0434
## # … with 3,990 more rows
```

Now for every level of `iter`

, \(R_{y, \hat y}^2 = .0434\), which matches up nicely with the text. But it seems odd that the value should be the same for each of the 4,000 HMC draws. Sadly, my efforts to debug my workflow have been unsuccessful. If you see a flaw in this method, please share on GitHub.

Just for kicks, here’s a more compact alternative to our `fitted()`

+ `left_join()`

approach that more closely resembles the work flow Singer and Willett showed on pages 102 and 103.

```
tibble(age_14 = 0:2) %>%
mutate(fitted = map(age_14, ~ post$b_Intercept + post$b_age_14 * .)) %>%
full_join(alcohol1_pp %>% select(id, age_14, alcuse),
by = "age_14") %>%
mutate(row = 1:n()) %>%
unnest(fitted) %>%
mutate(iter = rep(1:4000, times = alcohol1_pp %>% nrow())) %>%
group_by(iter) %>%
summarise(r = cor(fitted, alcuse),
r2 = cor(fitted, alcuse)^2)
```

```
## # A tibble: 4,000 x 3
## iter r r2
## <int> <dbl> <dbl>
## 1 1 0.208 0.0434
## 2 2 0.208 0.0434
## 3 3 0.208 0.0434
## 4 4 0.208 0.0434
## 5 5 0.208 0.0434
## 6 6 0.208 0.0434
## 7 7 0.208 0.0434
## 8 8 0.208 0.0434
## 9 9 0.208 0.0434
## 10 10 0.208 0.0434
## # … with 3,990 more rows
```

Either way, our results agree with those in the text: about “4.3% of the total variability in *ALCUSE* is associated with linear time” (p. 103, *emphasis* in the original).

#### 4.4.3.2 Pseudo-\(R^2\) statistics computed from the variance components.

Residual variation–that portion of the outcome variation unexplained by a model’s predictors–provides another criterion for comparison. When you fit a series of models, you hope that added predictors further explain unexplained outcome variation, causing residual variation to decline. The magnitude of this decline quantifies the improvement in fit. A large decline suggests that the predictors make a big difference; a small, or zero, decline suggests that they do not. To assess these declines on a common scale, we compute the

proportional reduction in residual varianceas we add predictors.Each unconditional model yields residual variances that serve as yardsticks for comparison. The unconditional means model provides a baseline estimate of \(\sigma_\epsilon^2\); the unconditional growth model provides baseline estimates of \(\sigma_0^2\) and \(\sigma_1^2\). Each leads to its own pseudo-\(R^2\) statistic. (p. 103,

emphasisin the original)

This provides three more pseudo-\(R^2\) statistics: \(R_\epsilon^2\), \(R_0^2\), and \(R_1^2\). The formula for the first is

\[ R_\epsilon^2 = \frac{\sigma_\epsilon^2 (\text{unconditional means model}) - \sigma_\epsilon^2 (\text{unconditional growth model})}{\sigma_\epsilon^2 (\text{unconditional means model})}. \]

We’ve actually already computed this one, above, under the name where we referred to it as the decline in \(\sigma_\epsilon^2\) after adding time to the model. Here it is again.

```
cbind(VarCorr(fit4.1, summary = F)[[2]][[1]],
VarCorr(fit4.2, summary = F)[[2]][[1]]) %>%
data.frame() %>%
mutate_all(~.^2) %>%
set_names(str_c("fit4.", 1:2)) %>%
mutate(r_2_epsilon = (fit4.1 - fit4.2) / fit4.1) %>%
summarise(mean = mean(r_2_epsilon),
median = median(r_2_epsilon),
sd = sd(r_2_epsilon),
ll = quantile(r_2_epsilon, prob = .025),
ul = quantile(r_2_epsilon, prob = .975)) %>%
mutate_if(is.double, round, digits = 3)
```

```
## mean median sd ll ul
## 1 0.341 0.357 0.14 0.017 0.564
```

Here’s a look at the full distribution for our \(\sigma_\epsilon^2\).

```
cbind(VarCorr(fit4.1, summary = F)[[2]][[1]],
VarCorr(fit4.2, summary = F)[[2]][[1]]) %>%
data.frame() %>%
mutate_all(~.^2) %>%
set_names(str_c("fit4.", 1:2)) %>%
mutate(r_2_epsilon = (fit4.1 - fit4.2) / fit4.1) %>%
ggplot(aes(x = r_2_epsilon)) +
geom_vline(xintercept = 0, color = "white") +
geom_density(fill = "grey25", color = "transparent") +
scale_x_continuous(expression(Pseudo~italic(R)[epsilon]^2), limits = c(-1, 1)) +
scale_y_continuous(NULL, breaks = NULL) +
theme(panel.grid = element_blank())
```

When we use the full posteriors of our two \(\epsilon_\epsilon^2\) parameters, we end up with a slightly smaller statistic than the one in the text. So our conclusion is about 35% of the intraindividual variance is accounted for by time.

If we consider additional models with predictors for the \(\zeta\)s, we can examine similar pseudo \(R^2\) statistics following the generic form

\[ R_\zeta^2 = \frac{\sigma_\zeta^2 (\text{unconditional growth model}) - \sigma_\zeta^2 (\text{subsequent model})}{\sigma_\zeta^2 (\text{unconditional growth model})}, \]

where \(\zeta\) could refer to \(\zeta_{0i}\), \(\zeta_{1i}\), and so on. If you look back up at the shape of the full posterior of \(R_\epsilon^2\), you’ll notice part of the left tail crosses zero. “Unlike traditional \(R^2\) statistics, which will always be positive (or zero), some of these statistics can be negative” (p. 104)! If you compute them, interpret pseudo-\(R^2\) statistics with a grain of salt.

## 4.5 Practical data analytic strategies for model building

A sound statistical model includes all necessary predictors and no unnecessary ones. But how do you separate the wheat from the chaff? We suggest you rely on a combination of substantive theory, research questions, and statistical evidence.

Neverlet a computer select predictors mechanically. (pp. 104–105,emphasisin the original)

### 4.5.1 A taxonomy of statistical models.

We suggest that you base decisions to enter, retain, and remove predictors on a combination of logic, theory, and prior research, supplemented by judicious [parameter evaluation] and comparison of model fit. At the outset, you might examine the effect of each predictor individually. You might then focus on predictors of primary interest (while including others whose effects you want to control). As in regular regression, you can add predictors singly or in groups and you can address issues of functional form using interactions and transformations. As you develop the taxonomy, you will progress toward a “final model” whose interpretation addresses your research questions. We place quotes around this term to emphasize that we believe no statistical model is

everfinal; it is simply a placeholder until a better model is found. (p. 105,emphasisin the original)

### 4.5.2 Interpreting fitted models.

You need not interpret every model you fit, especially those designed to guide interim decision making. When writing up findings for presentation and publication, we suggest that you identify a manageable subset of models that, taken together, tells a persuasive story parsimoniously. At a minimum, this includes the unconditional means model, the unconditional growth model, and a “final model.” You may also want to present intermediate models that either provide important building blocks or tell interesting stories in their own right. (p. 106)

In the dawn of the post-replication crisis era, it’s astonishing to reread and transcribe this section and the one above. I like a lot of what the authors had to say. Much of it seems like good pragmatic advice. But if they were to rewrite these sections again, I wonder what changes they’d make. Would they recommend researchers preregister their primary hypothesis, variables of interest, and perhaps their model building strategy (Nosek et al., 2018)? Would they be interested in a multiverse analysis (Steegen et al., 2016)? Would they still recommend sharing only a subset of one’s analyses in the era of sharing platforms like GitHub and the Open Science Framework? Would they weigh in on developments in causal inference (Pearl et al., 2016)?

#### 4.5.2.1 Model C: The uncontrolled effects of COA.

The default priors for Model C are the same as for the unconditional growth model. All we’ve done is add parameters of `class = b`

. As these default to improper flat priors, we have nothing to add to the `prior`

argument to include them. Feel free to check with `get_prior()`

. For the sake of practice, this model follows the form

\[\begin{align*} \text{alcuse}_{ij} & = \gamma_{00} + \gamma_{01} \text{coa}_i + \gamma_{10} \text{age_14}_{ij} + \gamma_{11} \text{coa}_i \times \text{age_14}_{ij} + \zeta_{0i} + \zeta_{1i} \text{age_14}_{ij} + \epsilon_{ij} \\ \epsilon_{ij} & \sim \text{Normal} (0, \sigma_\epsilon^2) \\ \begin{bmatrix} \zeta_{0i} \\ \zeta_{1i} \end{bmatrix} & \sim \text{Normal} \begin{pmatrix} \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \begin{bmatrix} \sigma_0^2 & \sigma_{01} \\ \sigma_{01} & \sigma_1^2 \end{bmatrix} \end{pmatrix}. \end{align*}\]

Fit the model.

```
.3 <-
fit4brm(data = alcohol1_pp,
family = gaussian,
~ 0 + Intercept + age_14 + coa + age_14:coa + (1 + age_14 | id),
alcuse prior = c(prior(student_t(3, 0, 2.5), class = sd),
prior(student_t(3, 0, 2.5), class = sigma),
prior(lkj(1), class = cor)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 4,
file = "fits/fit04.03")
```

Check the summary.

`print(fit4.3, digits = 3)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: alcuse ~ 0 + Intercept + age_14 + coa + age_14:coa + (1 + age_14 | id)
## Data: alcohol1_pp (Number of observations: 246)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Group-Level Effects:
## ~id (Number of levels: 82)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.696 0.100 0.500 0.895 1.001 905 1629
## sd(age_14) 0.361 0.099 0.123 0.536 1.011 275 222
## cor(Intercept,age_14) -0.084 0.297 -0.520 0.731 1.004 487 364
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept 0.317 0.135 0.053 0.582 1.000 2193 2603
## age_14 0.294 0.085 0.130 0.466 1.000 3416 3194
## coa 0.744 0.201 0.347 1.140 1.000 2443 2745
## age_14:coa -0.052 0.128 -0.313 0.194 1.002 3399 2783
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.609 0.053 0.516 0.726 1.003 403 601
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

Our \(\gamma\)s are quite similar to those presented in the text. Our \(\sigma_\epsilon\) for this model is about the same as with `fit4.2`

. Let’s practice with `conditional_effects()`

to plot the consequences of this model.

`conditional_effects(fit4.3)`

This time we got back three plots. The first two were of the lower-order parameters \(\gamma_{10}\) and \(\gamma_{01}\). Note how the plot for `coa`

treated it as a continuous variable. This is because the variable was saved as an integer in the original data set.

```
.3$data %>%
fit4glimpse()
```

```
## Rows: 246
## Columns: 5
## $ alcuse <dbl> 1.732051, 2.000000, 2.000000, 0.000000, 0.000000, 1.000000, 1.000000, 2.000000, …
## $ Intercept <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ age_14 <dbl> 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2,…
## $ coa <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ id <dbl> 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9,…
```

Coding it as an integer further complicated things for the third plot returned by `conditional_effects()`

, the one for the interaction of `age_14`

and `coa`

, \(\gamma_{11}\).

Since `coa`

is binary, the natural way to express its interaction with `age_14`

would be with `age_14`

on the \(x\)-axis and two separate trajectories, one for each value of `coa`

. That’s what Singer and Willett very sensibly did with the middle panel of Figure 4.3. However, the `conditional_effects()`

function defaults to expressing interactions such that the first variable in the term–in this case,`age_14`

–is on the \(x\)-axis and the second variable in the term–`coa`

, treated as an integer–is depicted in three lines corresponding its mean and its mean \(\pm\) one standard deviation. This is great for continuous variables, but incoherent for categorical ones. The fix is to adjust the data and refit the model.

```
.4 <-
fit4update(fit4.3,
newdata = alcohol1_pp %>% mutate(coa = factor(coa)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 4,
file = "fits/fit04.04")
```

We might compare the updated model with its predecessor. To get a focused look, we can use the `posterior_summary()`

function with a little subsetting.

`posterior_summary(fit4.3)[1:4, ] %>% round(digits = 3)`

```
## Estimate Est.Error Q2.5 Q97.5
## b_Intercept 0.317 0.135 0.053 0.582
## b_age_14 0.294 0.085 0.130 0.466
## b_coa 0.744 0.201 0.347 1.140
## b_age_14:coa -0.052 0.128 -0.313 0.194
```

`posterior_summary(fit4.4)[1:4, ] %>% round(digits = 3)`

```
## Estimate Est.Error Q2.5 Q97.5
## b_Intercept 0.317 0.135 0.053 0.582
## b_age_14 0.294 0.085 0.130 0.466
## b_coa1 0.744 0.201 0.347 1.140
## b_age_14:coa1 -0.052 0.128 -0.313 0.194
```

The results are about the same. The payoff comes when we try again with `conditional_effects()`

.

`conditional_effects(fit4.4)`

Much better. Now the plot for \(\gamma_{01}\) treats `coa`

as binary and our plot for the interaction between `age_14`

and `coa`

is much close to the one in Figure 4.3. Since we’re already on a `conditional_effects()`

tangent, we may as well go further. When working with models like `fit4.3`

where you have multiple fixed effects, sometimes you only want the plots for a subset of those effects. For example, if our main goal is to do a good job tastefully reproducing the middle plot in Figure 4.3, we only need the interaction plot. In such a case, use the `effects`

argument.

`conditional_effects(fit4.4, effects = "age_14:coa")`

Earlier we discussed how `conditional_effects()`

lets users adjust some of the output. But if you want an extensive overhaul, it’s better to save the output of `conditional_effects()`

as an object and manipulate that object with the `plot()`

function.

```
<- conditional_effects(fit4.4, effects = "age_14:coa")
ce
str(ce)
```

```
## List of 1
## $ age_14:coa:'data.frame': 200 obs. of 12 variables:
## ..$ age_14 : num [1:200] 0 0 0.0202 0.0202 0.0404 ...
## ..$ coa : Factor w/ 2 levels "0","1": 1 2 1 2 1 2 1 2 1 2 ...
## ..$ alcuse : num [1:200] 0.922 0.922 0.922 0.922 0.922 ...
## ..$ Intercept : num [1:200] 1 1 1 1 1 1 1 1 1 1 ...
## ..$ id : logi [1:200] NA NA NA NA NA NA ...
## ..$ cond__ : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
## ..$ effect1__ : num [1:200] 0 0 0.0202 0.0202 0.0404 ...
## ..$ effect2__ : Factor w/ 2 levels "0","1": 1 2 1 2 1 2 1 2 1 2 ...
## ..$ estimate__: num [1:200] 0.318 1.062 0.324 1.067 0.33 ...
## ..$ se__ : num [1:200] 0.133 0.148 0.132 0.147 0.131 ...
## ..$ lower__ : num [1:200] 0.0535 0.7689 0.0624 0.7743 0.0703 ...
## ..$ upper__ : num [1:200] 0.582 1.349 0.585 1.352 0.59 ...
## ..- attr(*, "effects")= chr [1:2] "age_14" "coa"
## ..- attr(*, "response")= chr "alcuse"
## ..- attr(*, "surface")= logi FALSE
## ..- attr(*, "categorical")= logi FALSE
## ..- attr(*, "ordinal")= logi FALSE
## ..- attr(*, "points")='data.frame': 246 obs. of 6 variables:
## .. ..$ age_14 : num [1:246] 0 1 2 0 1 2 0 1 2 0 ...
## .. ..$ coa : Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## .. ..$ resp__ : num [1:246] 1.73 2 2 0 0 ...
## .. ..$ cond__ : Factor w/ 1 level "1": 1 1 1 1 1 1 1 1 1 1 ...
## .. ..$ effect1__: num [1:246] 0 1 2 0 1 2 0 1 2 0 ...
## .. ..$ effect2__: Factor w/ 2 levels "0","1": 2 2 2 2 2 2 2 2 2 2 ...
## - attr(*, "class")= chr "brms_conditional_effects"
```

Our `ce`

is an object of class “’brms_conditional_effects” which contains a list of a single data frame. Had we omitted our `effects`

argument, above, we’d have a list of 3 instead. Anyway, these data frames contain the necessary information to produce the plot. The advantage of saving `ce`

this way is we can now insert it into the `plot()`

function. The simple output is the same as before.

`plot(ce)`

The `plot()`

function will allow us to do other things, like add in the original data or omit the white grid lines.

```
%>%
ce plot(points = T,
point_args = list(size = 1/4, alpha = 1/4, width = .05, height = .05, color = "black"),
theme = theme(panel.grid = element_blank()))
```

And for even more control, you can tack on typical **ggplot2** functions. But when you want to do so, make sure to set the `plot = FALSE`

argument and then subset after the right parenthesis of the `plot()`

function.

```
plot(ce,
theme = theme(legend.position = "none",
panel.grid = element_blank()),
plot = FALSE)[[1]] +
annotate(geom = "text",
x = 2.1, y = c(.95, 1.55),
label = str_c("coa = ", 0:1),
hjust = 0, size = 3.5) +
scale_fill_brewer(type = "qual") +
scale_color_brewer(type = "qual") +
scale_x_continuous("age", limits = c(-1, 3), labels = 13:17) +
scale_y_continuous(limits = c(0, 2), breaks = 0:2)
```

But anyway, let’s get back on track and talk about the variance components. Singer and Willett contrasted \(\sigma_\epsilon^2\) from Model B to the new one from Model C. We might use `VarCorr()`

to do the same.

`VarCorr(fit4.2)[[2]]`

```
## $sd
## Estimate Est.Error Q2.5 Q97.5
## 0.607636 0.05420805 0.5113789 0.7215826
```

`VarCorr(fit4.3)[[2]]`

```
## $sd
## Estimate Est.Error Q2.5 Q97.5
## 0.608819 0.05302776 0.5159619 0.7256104
```

We could have also extracted that information by subsetting `posterior_summary()`

.

`posterior_summary(fit4.2)["sigma", ]`

```
## Estimate Est.Error Q2.5 Q97.5
## 0.60763599 0.05420805 0.51137894 0.72158262
```

`posterior_summary(fit4.3)["sigma", ]`

```
## Estimate Est.Error Q2.5 Q97.5
## 0.60881898 0.05302776 0.51596194 0.72561043
```

Anyway, to get these in a variance metric, just square their posterior samples and summarize.

Our next task is to formally compare `fit4.2`

and `fit4.3`

in terms of declines in \(\sigma_0^2\) and \(\sigma_1^2\).

```
bind_cols(
posterior_samples(fit4.2) %>%
transmute(fit2_sigma_2_0 = sd_id__Intercept^2,
fit2_sigma_2_1 = sd_id__age_14^2),
posterior_samples(fit4.3) %>%
transmute(fit3_sigma_2_0 = sd_id__Intercept^2,
fit3_sigma_2_1 = sd_id__age_14^2)
%>%
) mutate(`decline~'in'~sigma[0]^2` = (fit2_sigma_2_0 - fit3_sigma_2_0) / fit2_sigma_2_0,
`decline~'in'~sigma[1]^2` = (fit2_sigma_2_1 - fit3_sigma_2_1) / fit2_sigma_2_1) %>%
pivot_longer(contains("decline")) %>%
ggplot(aes(x = value)) +
geom_vline(xintercept = 0, color = "white") +
geom_density(fill = "grey25", color = "transparent") +
scale_x_continuous(NULL, limits = c(-5, 2)) +
scale_y_continuous(NULL, breaks = NULL) +
theme(panel.grid = element_blank()) +
facet_wrap(~name, labeller = label_parsed, ncol = 1)
```

Here are the percents of variance declined from `fit4.2`

to `fit4.3`

.

```
bind_cols(
posterior_samples(fit4.2) %>%
transmute(fit2_sigma_2_0 = sd_id__Intercept^2,
fit2_sigma_2_1 = sd_id__age_14^2),
posterior_samples(fit4.3) %>%
transmute(fit3_sigma_2_0 = sd_id__Intercept^2,
fit3_sigma_2_1 = sd_id__age_14^2)
%>%
) mutate(`decline~'in'~sigma[0]^2` = (fit2_sigma_2_0 - fit3_sigma_2_0) / fit2_sigma_2_0,
`decline~'in'~sigma[1]^2` = (fit2_sigma_2_1 - fit3_sigma_2_1) / fit2_sigma_2_1) %>%
pivot_longer(contains("decline")) %>%
group_by(name) %>%
summarise(mean = mean(value),
median = median(value),
sd = sd(value),
ll = quantile(value, prob = .025),
ul = quantile(value, prob = .975))
```

```
## # A tibble: 2 x 6
## name mean median sd ll ul
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 decline~'in'~sigma[0]^2 0.161 0.219 0.344 -0.665 0.651
## 2 decline~'in'~sigma[1]^2 -2.03 -0.0271 27.9 -10.7 0.913
```

In this case, we end up with massive uncertainty when working with the full posteriors. This is particularly the case with the difference in \(\sigma_1^2\), which is left skewed for days. Here are the results when we only use point estimates.

```
bind_cols(
posterior_samples(fit4.2) %>%
transmute(fit2_sigma_2_0 = sd_id__Intercept^2,
fit2_sigma_2_1 = sd_id__age_14^2),
posterior_samples(fit4.3) %>%
transmute(fit3_sigma_2_0 = sd_id__Intercept^2,
fit3_sigma_2_1 = sd_id__age_14^2)
%>%
) summarise_all(median) %>%
transmute(`% decline in sigma_2_0` = 100 * (fit2_sigma_2_0 - fit3_sigma_2_0) / fit2_sigma_2_0,
`% decline in sigma_2_1` = 100 * (fit2_sigma_2_1 - fit3_sigma_2_1) / fit2_sigma_2_1)
```

```
## % decline in sigma_2_0 % decline in sigma_2_1
## 1 21.39691 -1.38481
```

“These variance components are now called *partial* or *conditional* variances because they quantify the interindividual differences in change that remain unexplained by the model’s predictors” (p. 108, emphasis in the original).

#### 4.5.2.2 Model D: The controlled effects of COA.

This model follows the form

\[\begin{align*} \text{alcuse}_{ij} & = \gamma_{00} + \gamma_{01} \text{coa}_i + \gamma_{02} \text{peer}_i + \gamma_{10} \text{age_14}_{ij} \\ & \;\;\; + \gamma_{11} \text{coa}_i \times \text{age_14}_{ij} + \gamma_{12} \text{peer}_i \times \text{age_14}_{ij} \\ & \;\;\; + \zeta_{0i} + \zeta_{1i} \text{age_14}_{ij} + \epsilon_{ij} \\ \epsilon_{ij} & \sim \text{Normal} (0, \sigma_\epsilon^2) \\ \begin{bmatrix} \zeta_{0i} \\ \zeta_{1i} \end{bmatrix} & \sim \text{Normal} \begin{pmatrix} \begin{bmatrix} 0 \\ 0 \end{bmatrix}, \begin{bmatrix} \sigma_0^2 & \sigma_{01} \\ \sigma_{01} & \sigma_1^2 \end{bmatrix} \end{pmatrix}. \end{align*}\]

Fit that joint.

```
.5 <-
fit4brm(data = alcohol1_pp,
family = gaussian,
~ 0 + Intercept + age_14 + coa + peer + age_14:coa + age_14:peer + (1 + age_14 | id),
alcuse prior = c(prior(student_t(3, 0, 2.5), class = sd),
prior(student_t(3, 0, 2.5), class = sigma),
prior(lkj(1), class = cor)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 4,
file = "fits/fit04.05")
```

`print(fit4.5, digits = 3)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: alcuse ~ 0 + Intercept + age_14 + coa + peer + age_14:coa + age_14:peer + (1 + age_14 | id)
## Data: alcohol1_pp (Number of observations: 246)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Group-Level Effects:
## ~id (Number of levels: 82)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.481 0.101 0.283 0.668 1.003 574 1327
## sd(age_14) 0.363 0.080 0.207 0.514 1.006 314 683
## cor(Intercept,age_14) 0.123 0.341 -0.400 0.895 1.010 264 425
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -0.317 0.152 -0.611 -0.016 1.001 1775 2632
## age_14 0.435 0.117 0.203 0.668 1.000 2058 2555
## coa 0.580 0.168 0.259 0.912 1.001 2116 2518
## peer 0.693 0.114 0.471 0.917 1.000 1758 2227
## age_14:coa -0.018 0.126 -0.266 0.223 1.001 2847 2915
## age_14:peer -0.151 0.088 -0.324 0.020 1.000 2210 2882
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.607 0.047 0.520 0.704 1.003 414 1318
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

All our \(\gamma\) estimates are similar to those presented in Table 4.1. Let’s compute the variances and the covariance, \(\sigma_{01}^2\). Here are the plots.

```
<-
v posterior_samples(fit4.5) %>%
transmute(sigma_2_epsilon = sigma^2,
sigma_2_0 = sd_id__Intercept^2,
sigma_2_1 = sd_id__age_14^2,
sigma_01 = sd_id__Intercept * cor_id__Intercept__age_14 * sd_id__age_14)
%>%
v pivot_longer(everything()) %>%
ggplot(aes(x = value)) +
geom_density(size = 0, fill = "black") +
scale_y_continuous(NULL, breaks = NULL) +
theme(panel.grid = element_blank()) +
facet_wrap(~name, scales = "free")
```

And now we compute the summary statistics.

```
%>%
v pivot_longer(everything()) %>%
group_by(name) %>%
summarise(mean = mean(value),
median = median(value),
sd = sd(value),
ll = quantile(value, prob = .025),
ul = quantile(value, prob = .975)) %>%
mutate_if(is.double, round, digits = 3)
```

```
## # A tibble: 4 x 6
## name mean median sd ll ul
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 sigma_01 0.006 0.011 0.053 -0.109 0.09
## 2 sigma_2_0 0.241 0.235 0.097 0.08 0.446
## 3 sigma_2_1 0.138 0.132 0.058 0.043 0.265
## 4 sigma_2_epsilon 0.37 0.365 0.058 0.27 0.496
```

Like the \(\gamma\)’s, our variance components are all similar to those in the text.

```
bind_cols(
posterior_samples(fit4.2) %>%
transmute(fit4.2_sigma_2_epsilon = sigma^2,
fit4.2_sigma_2_0 = sd_id__Intercept^2,
fit4.2_sigma_2_1 = sd_id__age_14^2),
posterior_samples(fit4.5) %>%
transmute(fit4.5_sigma_2_epsilon = sigma^2,
fit4.5_sigma_2_0 = sd_id__Intercept^2,
fit4.5_sigma_2_1 = sd_id__age_14^2)
%>%
) summarise_all(median) %>%
mutate(`% decline in sigma_2_epsilon` = 100 * (fit4.2_sigma_2_epsilon - fit4.5_sigma_2_epsilon) / fit4.2_sigma_2_epsilon,
`% decline in sigma_2_0` = 100 * (fit4.2_sigma_2_0 - fit4.5_sigma_2_0) / fit4.2_sigma_2_0,
`% decline in sigma_2_1` = 100 * (fit4.2_sigma_2_1 - fit4.5_sigma_2_1) / fit4.2_sigma_2_1) %>%
pivot_longer(contains("%")) %>%
select(name, value)
```

```
## # A tibble: 3 x 2
## name value
## <chr> <dbl>
## 1 % decline in sigma_2_epsilon -0.670
## 2 % decline in sigma_2_0 61.8
## 3 % decline in sigma_2_1 2.19
```

The percentages in which our variance componence declined relative to the unconditional growth model are of similar orders of magnitude as those presented in the text.

#### 4.5.2.3 Model E: A tentative “final model” for the controlled effects of `coa`

.

This model is just like the last, but with the simple omission of the \(\gamma_{12}\) parameter.

```
.6 <-
fit4brm(data = alcohol1_pp,
family = gaussian,
~ 0 + Intercept + age_14 + coa + peer + age_14:peer + (1 + age_14 | id),
alcuse prior = c(prior(student_t(3, 0, 2.5), class = sd),
prior(student_t(3, 0, 2.5), class = sigma),
prior(lkj(1), class = cor)),
iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 4,
file = "fits/fit04.06")
```

`print(fit4.6, digits = 3)`

```
## Family: gaussian
## Links: mu = identity; sigma = identity
## Formula: alcuse ~ 0 + Intercept + age_14 + coa + peer + age_14:peer + (1 + age_14 | id)
## Data: alcohol1_pp (Number of observations: 246)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Group-Level Effects:
## ~id (Number of levels: 82)
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept) 0.483 0.100 0.280 0.676 1.009 654 1453
## sd(age_14) 0.361 0.080 0.202 0.517 1.012 334 843
## cor(Intercept,age_14) 0.122 0.338 -0.398 0.877 1.012 254 423
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept -0.316 0.148 -0.623 -0.025 1.002 2132 2435
## age_14 0.423 0.107 0.209 0.636 1.001 1935 2117
## coa 0.574 0.152 0.277 0.869 1.000 2531 2942
## peer 0.697 0.113 0.476 0.915 1.002 2118 2484
## age_14:peer -0.151 0.085 -0.315 0.019 1.000 2020 2528
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma 0.605 0.049 0.515 0.704 1.011 462 1530
##
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

The \(\gamma\)’s all look well behaved. Here are the variance component summaries.

```
<-
v posterior_samples(fit4.6) %>%
transmute(sigma_2_epsilon = sigma^2,
sigma_2_0 = sd_id__Intercept^2,
sigma_2_1 = sd_id__age_14^2,
sigma_01 = sd_id__Intercept * cor_id__Intercept__age_14 * sd_id__age_14)
%>%
v pivot_longer(everything()) %>%
group_by(name) %>%
summarise(mean = mean(value),
median = median(value),
sd = sd(value),
ll = quantile(value, prob = .025),
ul = quantile(value, prob = .975)) %>%
mutate_if(is.double, round, digits = 3)
```

```
## # A tibble: 4 x 6
## name mean median sd ll ul
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 sigma_01 0.006 0.01 0.053 -0.111 0.094
## 2 sigma_2_0 0.244 0.236 0.097 0.078 0.456
## 3 sigma_2_1 0.136 0.13 0.058 0.041 0.267
## 4 sigma_2_epsilon 0.368 0.365 0.059 0.265 0.495
```

### 4.5.3 Displaying prototypical change trajectories.

On page 111, Singer and Willett computed the various levels of the \(\pi\) coefficients when `coa == 0`

or `coa == 1`

. To follow along, we’ll want to work directly with the posterior samples of `fit4.3`

.

```
<- posterior_samples(fit4.3)
post
%>%
post select(starts_with("b_")) %>%
head()
```

```
## b_Intercept b_age_14 b_coa b_age_14:coa
## 1 0.3412878 0.2504153 0.6080214 0.046260014
## 2 0.2573570 0.1868070 0.7253951 -0.029303681
## 3 0.4711266 0.3424222 0.4388561 0.001097776
## 4 0.5060308 0.2622469 0.8187884 0.044836155
## 5 0.3116978 0.3577098 0.5366854 -0.049801869
## 6 0.2716599 0.3201766 0.4702723 -0.087098213
```

Here we apply the formulas to the posterior samples and then summarize with posterior means.

```
%>%
post select(starts_with("b_")) %>%
transmute(pi_0_coa0 = b_Intercept + b_coa * 0,
pi_1_coa0 = b_age_14 + `b_age_14:coa` * 0,
pi_0_coa1 = b_Intercept + b_coa * 1,
pi_1_coa1 = b_age_14 + `b_age_14:coa` * 1) %>%
pivot_longer(everything()) %>%
group_by(name) %>%
summarise(posterior_mean = mean(value) %>% round(digits = 3))
```

```
## # A tibble: 4 x 2
## name posterior_mean
## <chr> <dbl>
## 1 pi_0_coa0 0.317
## 2 pi_0_coa1 1.06
## 3 pi_1_coa0 0.294
## 4 pi_1_coa1 0.242
```

We already plotted these trajectories and their 95% intervals a few sections up. If we want to work with the full composite model to predict \(Y_{ij}\) (i.e., `alcuse`

) directly, we multiply the `b_coa`

, `b_age_14`

, and `b_age_14:coa`

vectors by the appropriate values of `coa`

and `peer`

. For example, here’s what you’d code if you wanted the initial `alcuse`

status for when `coa == 1`

.

```
%>%
post select(starts_with("b_")) %>%
mutate(y = b_Intercept + b_coa * 1 + b_age_14 * 0 + `b_age_14:coa` * 0 * 1) %>%
head()
```

```
## b_Intercept b_age_14 b_coa b_age_14:coa y
## 1 0.3412878 0.2504153 0.6080214 0.046260014 0.9493092
## 2 0.2573570 0.1868070 0.7253951 -0.029303681 0.9827521
## 3 0.4711266 0.3424222 0.4388561 0.001097776 0.9099826
## 4 0.5060308 0.2622469 0.8187884 0.044836155 1.3248192
## 5 0.3116978 0.3577098 0.5366854 -0.049801869 0.8483832
## 6 0.2716599 0.3201766 0.4702723 -0.087098213 0.7419322
```

If you were to take the mean of that new `y`

column, you’d discover it’s the same as the mean of our `pi_0_coa1`

, above.

```
%>%
post select(starts_with("b_")) %>%
mutate(y = b_Intercept + b_coa * 1 + b_age_14 * 0 + `b_age_14:coa` * 0 * 1) %>%
summarise(pi_0_coa1 = mean(y))
```

```
## pi_0_coa1
## 1 1.06155
```

Singer and Willett suggested four strategies to help researchers pick the prototypical values of the predictors to focus on:

- Substantively interesting values (e.g., typical ages, values corresponding to transition points)
- A range of percentiles (e.g., 25
^{th}, 50^{th}, and 75^{th}) - Just the sample mean
- The sample mean \(\pm\) something like 1 standard deviation

They next discuss the right panel of Figure 4.3. We could continue to work directly with the `posterior_samples()`

to make our version of that figure. But it you want to accompany the posterior mean trajectories with their 95% intervals, and I hope you do, the `posterior_samples()`

method will get tedious. Happily, **brms** offers users and alternative with the `fitted()`

function. Since the right panel is somewhat complicated, it’ll behoove us to practice with the simpler left panel, first.

In `fit4.2`

(i.e., Model C), `age_14`

is the only predictor. Here we’ll specify the values along the range in the original data, ranging from 0 to 2. However, we end up specifying a bunch of values within that range in addition to the two endpoints. This is because the 95% intervals typically have a bow tie shape. To depict that shape well, we need more than a couple values. We save those values as a tibble called `nd`

(i.e., new data). We make use of them within fitted with the `newdata = nd`

argument.

Since we’re only interested in the general trajectory, the consequence of the \(\gamma\)’s, we end up coding `re_formula = NA`

. In so doing, we ask `fitted()`

to ignore the group-level effects. In this example, that means we are ignoring the `id`

-level deviations from the overall trajectories. If you’re confused by that that means, don’t worry. That part of the model should become more clear as we go along in the text.

Since `fitted()`

returns an array, we then convert the results into a data frame for use within the **tidyverse** framework. For plotting, it’s handy to bind those results together with the `nd`

, the predictor values we used to compute the fitted values with. In the final wrangling step, we use our `age_14`

values to compute the `age`

values.

```
<-
nd tibble(age_14 = seq(from = 0, to = 2, length.out = 30))
<-
f fitted(fit4.2,
newdata = nd,
re_formula = NA) %>%
data.frame() %>%
bind_cols(nd) %>%
mutate(age = age_14 + 14)
head(f)
```

```
## Estimate Est.Error Q2.5 Q97.5 age_14 age
## 1 0.6477610 0.1074588 0.4332701 0.8678844 0.00000000 14.00000
## 2 0.6664848 0.1056593 0.4569521 0.8826825 0.06896552 14.06897
## 3 0.6852085 0.1040080 0.4767514 0.8974762 0.13793103 14.13793
## 4 0.7039322 0.1025120 0.4986159 0.9121689 0.20689655 14.20690
## 5 0.7226559 0.1011784 0.5198120 0.9262679 0.27586207 14.27586
## 6 0.7413796 0.1000134 0.5398018 0.9420339 0.34482759 14.34483
```

Since we only had one predictor, `age_14`

, for which we specified 30 specific values, we ended up with 30 rows in our output. By default, `fitted()`

summarized the fitted values with posterior means (`Estimate`

), standard deviations (`Est.Error`

), and percentile-based 95% intervals (`Q2.5`

and `Q97.5`

). The other columns are the values we bound to them. Here’s how we might use these to make our `fitted()`

version of the leftmost panel of Figure 4.3.

```
%>%
f ggplot(aes(x = age)) +
geom_ribbon(aes(ymin = Q2.5, ymax = Q97.5),
fill = "grey75", alpha = 3/4) +
geom_line(aes(y = Estimate)) +
scale_y_continuous("alcuse", breaks = 0:2, limits = c(0, 2)) +
coord_cartesian(xlim = c(13, 17)) +
theme(panel.grid = element_blank())
```

With `fit4.6`

(i.e., Model E), we now have three predictors. We’d like to see the full range across `age_14`

for four combinations of `coa`

and `peer`

values. To my mind, the easiest way to get those values right is with a little `crossing()`

and `expand()`

.

```
<-
nd crossing(coa = 0:1,
peer = c(.655, 1.381)) %>%
expand(nesting(coa, peer),
age_14 = seq(from = 0, to = 2, length.out = 30))
head(nd, n = 10)
```

```
## # A tibble: 10 x 3
## coa peer age_14
## <int> <dbl> <dbl>
## 1 0 0.655 0
## 2 0 0.655 0.0690
## 3 0 0.655 0.138
## 4 0 0.655 0.207
## 5 0 0.655 0.276
## 6 0 0.655 0.345
## 7 0 0.655 0.414
## 8 0 0.655 0.483
## 9 0 0.655 0.552
## 10 0 0.655 0.621
```

Now we use `fitted()`

much like before.

```
<-
f fitted(fit4.6,
newdata = nd,
re_formula = NA) %>%
data.frame() %>%
bind_cols(nd) %>%
# a little wrangling will make plotting much easier
mutate(age = age_14 + 14,
coa = ifelse(coa == 0, "coa = 0", "coa = 1"),
peer = factor(peer))
glimpse(f)
```

```
## Rows: 120
## Columns: 8
## $ Estimate <dbl> 0.1400448, 0.1624334, 0.1848220, 0.2072107, 0.2295993, 0.2519879, 0.2743765, 0.2…
## $ Est.Error <dbl> 0.1096933, 0.1081919, 0.1068906, 0.1057966, 0.1049165, 0.1042556, 0.1038182, 0.1…
## $ Q2.5 <dbl> -0.082564660, -0.058429140, -0.034447621, -0.008760705, 0.017104627, 0.039855143…
## $ Q97.5 <dbl> 0.3527086, 0.3713115, 0.3905097, 0.4106447, 0.4325743, 0.4526990, 0.4736675, 0.4…
## $ coa <chr> "coa = 0", "coa = 0", "coa = 0", "coa = 0", "coa = 0", "coa = 0", "coa = 0", "co…
## $ peer <fct> 0.655, 0.655, 0.655, 0.655, 0.655, 0.655, 0.655, 0.655, 0.655, 0.655, 0.655, 0.6…
## $ age_14 <dbl> 0.00000000, 0.06896552, 0.13793103, 0.20689655, 0.27586207, 0.34482759, 0.413793…
## $ age <dbl> 14.00000, 14.06897, 14.13793, 14.20690, 14.27586, 14.34483, 14.41379, 14.48276, …
```

For our version of the right panel of Figure 4.3, most of the action is in `ggplot()`

, `geom_ribbon()`

, `geom_line()`

, and `facet_wrap()`

. All the rest is cosmetic.

```
%>%
f ggplot(aes(x = age, color = peer, fill = peer)) +
geom_ribbon(aes(ymin = Q2.5, ymax = Q97.5),
size = 0, alpha = 1/4) +
geom_line(aes(y = Estimate, size = peer)) +
scale_size_manual(values = c(1/2, 1)) +
scale_fill_manual(values = c("blue3", "red3")) +
scale_color_manual(values = c("blue3", "red3")) +
scale_y_continuous("alcuse", breaks = 0:2) +
labs(subtitle = "High peer values are in red; low ones are in blue.") +
coord_cartesian(xlim = c(13, 17)) +
theme(legend.position = "none",
panel.grid = element_blank()) +
facet_wrap(~coa)
```

In my opinion, it works better to split the plot into two when you include the 95% intervals.

### 4.5.4 Recentering predictors to improve interpretation.

The easiest strategy for recentering a time-invariant predictor is to subtract its sample mean from each observed value. When we center a predictor on its sample mean, the level-2 fitted intercepts represent the average fitted values of initial status (or rate of change). We can also recenter a time-invariant predictor by subtracting another meaningful value… Recentering works best when the centering constant is substantively meaningful. (pp. 113–114)

As we’ll see later, centering can also make it easier to select meaningful priors on the model intercept. If you look at our `alcohol1_pp`

data, you’ll see we already have centered versions of our time-invariant predictors. They’re the last two columns, `cpeer`

and `ccoa`

.

```
%>%
alcohol1_pp glimpse()
```

```
## Rows: 246
## Columns: 9
## $ id <dbl> 1, 1, 1, 2, 2, 2, 3, 3, 3, 4, 4, 4, 5, 5, 5, 6, 6, 6, 7, 7, 7, 8, 8, 8, 9, 9, 9, 10…
## $ age <dbl> 14, 15, 16, 14, 15, 16, 14, 15, 16, 14, 15, 16, 14, 15, 16, 14, 15, 16, 14, 15, 16,…
## $ coa <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ male <dbl> 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 1, 1, 0,…
## $ age_14 <dbl> 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0,…
## $ alcuse <dbl> 1.732051, 2.000000, 2.000000, 0.000000, 0.000000, 1.000000, 1.000000, 2.000000, 3.3…
## $ peer <dbl> 1.2649111, 1.2649111, 1.2649111, 0.8944272, 0.8944272, 0.8944272, 0.8944272, 0.8944…
## $ cpeer <dbl> 0.2469111, 0.2469111, 0.2469111, -0.1235728, -0.1235728, -0.1235728, -0.1235728, -0…
## $ ccoa <dbl> 0.549, 0.549, 0.549, 0.549, 0.549, 0.549, 0.549, 0.549, 0.549, 0.549, 0.549, 0.549,…
```

If you wanted to center them by hand, you’d just execute something like this.

```
%>%
alcohol1_pp mutate(peer_c = peer - mean(peer))
```

```
## # A tibble: 246 x 10
## id age coa male age_14 alcuse peer cpeer ccoa peer_c
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 1 14 1 0 0 1.73 1.26 0.247 0.549 0.247
## 2 1 15 1 0 1 2 1.26 0.247 0.549 0.247
## 3 1 16 1 0 2 2 1.26 0.247 0.549 0.247
## 4 2 14 1 1 0 0 0.894 -0.124 0.549 -0.123
## 5 2 15 1 1 1 0 0.894 -0.124 0.549 -0.123
## 6 2 16 1 1 2 1 0.894 -0.124 0.549 -0.123
## 7 3 14 1 1 0 1 0.894 -0.124 0.549 -0.123
## 8 3 15 1 1 1 2 0.894 -0.124 0.549 -0.123
## 9 3 16 1 1 2 3.32 0.894 -0.124 0.549 -0.123
## 10 4 14 1 1 0 0 1.79 0.771 0.549 0.771
## # … with 236 more rows
```

Did you notice how our `peer_c`

values, above, deviated slightly from those in `cpeer`

? That’s because `peer_c`

was based on the exact sample mean. Those in `cpeer`

are based on the sample mean as provided in the text, 1.018, which is introduces rounding error. For the sake of simplicity, we’ll go with centered variables matching up with the text.

Here we’ll hastily fit the models with help from the `update()`

function.

```
.7 <-
fit4update(fit4.6,
newdata = alcohol1_pp,
~ 0 + Intercept + age_14 + coa + cpeer + age_14:cpeer + (1 + age_14 | id),
alcuse iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 4,
file = "fits/fit04.07")
.8 <-
fit4update(fit4.6,
newdata = alcohol1_pp,
~ 0 + Intercept + age_14 + ccoa + peer + age_14:peer + (1 + age_14 | id),
alcuse iter = 2000, warmup = 1000, chains = 4, cores = 4,
seed = 4,
file = "fits/fit04.08")
```

Here we reproduce the \(\gamma\)s from `fit4.6`

and compare the to the updates from `fit4.7`

and `fit4.8`

.

`fixef(fit4.6) %>% round(digits = 3)`

```
## Estimate Est.Error Q2.5 Q97.5
## Intercept -0.316 0.148 -0.623 -0.025
## age_14 0.423 0.107 0.209 0.636
## coa 0.574 0.152 0.277 0.869
## peer 0.697 0.113 0.476 0.915
## age_14:peer -0.151 0.085 -0.315 0.019
```

`fixef(fit4.7) %>% round(digits = 3)`

```
## Estimate Est.Error Q2.5 Q97.5
## Intercept 0.397 0.107 0.190 0.605
## age_14 0.270 0.062 0.149 0.390
## coa 0.569 0.152 0.273 0.866
## cpeer 0.694 0.116 0.468 0.918
## age_14:cpeer -0.152 0.087 -0.325 0.018
```

`fixef(fit4.8) %>% round(digits = 3)`

```
## Estimate Est.Error Q2.5 Q97.5
## Intercept -0.061 0.139 -0.333 0.216
## age_14 0.427 0.106 0.211 0.634
## ccoa 0.569 0.152 0.276 0.868
## peer 0.699 0.112 0.480 0.918
## age_14:peer -0.152 0.084 -0.318 0.010
```

## 4.6 Comparing models using deviance statistics

As you will see, we will also make use of deviance within our Bayesian Stan-based paradigm. But we’ll do so a little differently than what Singer and Willett presented.

### 4.6.1 The deviance statistic.

As it turns out, we Bayesians use the log-likelihood (LL), too. Recall how the numerator in the right-hand side of Bayes’ Theorem was \(p(\text{data} | \theta) p(\theta)\)? That first part, \(p(\text{data} | \theta)\), is the likelihood. In words, the likelihood is the *probability of the data given the parameters*. We generally work with the log of the likelihood rather than the likelihood itself because it’s easier to work with statistically.

When you’re working with **brms**, you can extract the LL with the `log_lik()`

function. Here’s an example with `fit4.1`

, our unconditional means model.

```
log_lik(fit4.1) %>%
str()
```

```
## num [1:4000, 1:246] -0.956 -0.706 -0.723 -0.716 -0.636 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : NULL
```

You may have noticed we didn’t just get a single value back. Rather, we got an array of 4,000 rows and 246 columns. The reason we got 4,000 rows is because that’s how many post-warmup iterations we drew from the posterior. I.e., we set `brm(..., iter = 2000, warmup = 1000, chains = 4)`

. With respect to the 246 columns, that’s how many rows there are in the `alcohol1_pp`

data. So for each case in the data, we get an entire posterior distribution of LL values.

With the multilevel model, we can define deviance for a given model as its LL times -2,

\[ \text{Deviance} = -2 LL_\text{current model}. \]

Here that is in code for `fit4.1`

.

```
<-
ll log_lik(fit4.1) %>%
data.frame() %>%
mutate(ll = rowSums(.)) %>%
mutate(deviance = -2 * ll) %>%
select(ll, deviance, everything())
dim(ll)
```

`## [1] 4000 248`

Because we used HMC, deviance is a distribution rather than a single number. Here’s what it looks like for `fit4.1`

.

```
%>%
ll ggplot(aes(x = deviance)) +
geom_density(fill = "grey25", size = 0) +
scale_y_continuous(NULL, breaks = NULL) +
theme(panel.grid = element_blank())
```

Much like the frequentists, we Bayesian generally prefer models with smaller deviance distributions.

The reasons frequentists multiply the LL by -2 is because after doing so, the difference in deviances between two models follows a \(\chi^2\) distribution and the old \(\chi^2\)-difference test is widely-used in frequentist statistics. Bayesians often just go ahead and use the -2 multiplication, too. It’s largely out of tradition. But as we’ll see, some contemporary Bayesians are challenging that tradition.

### 4.6.2 When and how can you compare deviance statistics?

As for the frequentists, deviance values/distributions in the Bayesian context are only meaningful in the relative sense. You cannot directly interpret a single models deviance distribution by the magnitude or sign of its central tendency. But you can compare two or more models by the relative locations of their deviance distributions. When doing so, they must have been computed using the same data (i.e., no differences in missingness in the predictors) and the models must be nested.

However, in contemporary Bayesian practice we don’t tend to compare models with deviance. For details on why, check out Chapter 6 in McElreath’s *Statistical Rethinking*. McElreath also covered the topic in several online lectures (e.g., here and here).

### 4.6.3 Implementing deviance-based hypothesis tests.

In this project, we are not going to practice comparing deviances using frequentist \(\chi^2\) tests. We will, however, cover Bayesian information criteria.

### 4.6.4 ~~AIC and BIC~~ WAIC and LOO statistics: Comparing nonnested models using information criteria [and cross validation].

We do not use the AIC or the BIC within the Stan ecosystem. The AIC is frequentist and cannot handle models with priors. The BIC is interesting in it’s a double misnomer. It is neither Bayesian nor is it a proper information criterion–though it does scale like one. However, it might be useful for our purposes to walk out the AIC a bit. It’ll ground our discussion of the WAIC and LOO. From Spiegelhalter, Best, Carlin and van der Linde [@-spiegelhalterDevianceInformationCriterion2014], we read:

Suppose that we have a given set of candidate models, and we would like a criterion to assess which is ‘better’ in a defined sense. Assume that a model for observed data \(y\) postulates a density \(p(y | \theta)\) (which may include covariates etc.), and call \(D(\theta) = -2 \log {p(y | \theta)}\) the deviance, here considered as a function of \(\theta\). Classical model choice uses hypothesis testing for comparing nested models, e.g. the deviance (likelihood ratio) test in generalized linear models. For non nested models, alternatives include the Akaike information criterion

\[AIC = -2 \log {p(y | \hat{\theta})} + 2k\]

where \(\hat{\theta}\) is the maximum likelihood estimate and \(k\) is the number of parameters in the model (dimension of \(\Theta\)).

AIC is built with the aim of favouring models that are likely to make good predictions. Since we generally do not have independent validation data, we can assess which model best predicts the

observeddata by using the deviance, but if parameters have been estimated we need some penalty for this double use of the data. AIC’s penalty of2khas been shown to be asymptotically equivalent to leave-one-out cross-validation. However, AIC does not work in models with informative prior information, such as hierarchical models, since the prior effectively acts to ‘restrict’ the freedom of the model parameters, so the appropriate ‘number of parameters’ is generally unclear. (pp. 485–486,emphasisin the original)

For the past two decades, the Deviance Information Criterion [DIC; Spiegelhalter et al. (2002)] has been a popular information criterion among Bayesians. Let’s define \(D\) as the posterior distribution of deviance values and \(\bar D\) as its mean. If you compute deviance based on the posterior mean, you have \(\hat D\). Within a multi-parameter model, this would be the deviance based on the collection of the posterior mean of each parameter. With these, we define the DIC as

\[\text{DIC} = \bar D + (\bar D + \hat D) + \bar D + p_D,\]

where \(p_D\) is the number of effective parameters in the model, which is also sometimes referred to as the penalty term. As McElreath pointed out in *Statistical Rethinking*, the \(p_D\)

is just the expected distance between the deviance in-sample and the deviance out-of-sample. In the case of flat priors, DIC reduces directly to AIC, because the expected distance is just the number of parameters. But more generally, \(p_D\) will be some fraction of the number of parameters, because regularizing priors constrain a model’s flexibility. (p. 191)

As you’ll see, you can get the \(p_D\) for `brms::brm()`

models. However, the DIC is limited in that it requires a multivariate Gaussian posterior and I’m not aware of a convenience function within **brms** that will compute the DIC. Which is fine. The DIC has been overshadowed in recent years by newer methods. But for a great talk on the DIC, check out the authoritative David Spiegelhalter’s *Retrospective read paper: Bayesian measure of model complexity and fit*.

#### 4.6.4.1 The Widely Applicable Information Criterion (WAIC).

The main information criterion within our Stan ecosystem paradigm is the Widely Applicable Information Criterion [WAIC; Watanabe (2010)]. From McElreath, again, we read:

It does not require a multivariate Gaussian posterior, and it is often more accurate than DIC. There are types of models for which it is hard to define at all, however. We’ll discuss that issue more, after defining WAIC.

The distinguishing feature of WAIC is that it is

pointwise. This means that uncertainty in prediction is considered case-by-case, or point-by-point, in the data. This is useful, because some observations are much harder to predict than others and may also have different uncertainty… You can think of WAIC as handling uncertainty where it actually matters: for each independent observation.Define \(\text{Pr} (y_i)\) as the average likelihood of observation \(i\) in the training sample. This means we compute the likelihood of \(y_i\) for each set of parameters samples from the posterior distribution. Then we average the likelihoods for each observation \(i\) and finally sum over all observations. This produces the first part of WAIC, the log-pointwise-predictive-density,

\[\text{lppd} = \sum_{i = 1}^N \text{log Pr} (y_i)\]

You might say this out loud as:

The log-pointwise-predictive-density is the total across observations of the logarithm of the average likelihood of each observation.The lppd is just a pointwise analog of deviance, averaged over the posterior distribution. If you multiplied it by -2, it’d be similar to the deviance, in fact.

The second piece of WAIC is the effective number of parameters \(p_\text{WAIC}\). Define \(V(y_i)\) as the variance in log-likelihood for observation \(i\) in the training sample. This means we compute the log-likelihood of \(y_i\) for each sample from the posterior distribution. Then we take the variance of those values. This is \(V(y_i)\). Now \(p_\text{WAIC}\) is defined as:

\[p_\text{WAIC} = \sum_{i=1}^N V(y_i)\]

Now WAIC is defined as:

\[\text{WAIC} = -2(\text{lppd} - p_\text{WAIC})\]

And this value is yet another estimate of out-of-sample deviance. (pp. 191–192, emphasis in the original)

In Chapter 6 of my ebook translating McElreath’s *Statistical Rethinking* into **brms** and **tidyverse** code, I walk out how to hand compute the WAIC for a `brm()`

fit. I’m not going to repeat the exercise, here. But do see the project and McElreath’s text if you’re interested. Rather, I’d like to get down to business. In **brms**, you can get a model’s WAIC with the `waic()`

function.

`waic(fit4.1)`

```
##
## Computed from 4000 by 246 log-likelihood matrix
##
## Estimate SE
## elpd_waic -311.8 12.2
## p_waic 55.1 4.8
## waic 623.6 24.4
##
## 41 (16.7%) p_waic estimates greater than 0.4. We recommend trying loo instead.
```

We’ll come back to that warning message, later. For now, notice the main output is a \(3 \times 2\) data frame with named rows. For the statistic in each row, you get a point estimate and a standard error. The WAIC is on the bottom. The effective number of parameters, the \(p_\text{WAIC}\), is in the middle. Notice the `elpd_waic`

on the top. That’s what you get without the \(-2 \times ...\) in the formula. Remember how that part is just to put things in a metric amenable to \(\chi^2\)-difference testing? Well, not all Bayesians like that and within the Stan ecosystem you’ll also see the WAIC expressed instead as the \(\text{elpd}_\text{WAIC}\).

The current recommended workflow within **brms** is to attach the WAIC information to the model fit. You do it with the `add_criterion()`

function.

`.1 <- add_criterion(fit4.1, "waic") fit4`

And now you can access that information directly with good-old `$`

indexing.

`.1$criteria$waic fit4`

```
##
## Computed from 4000 by 246 log-likelihood matrix
##
## Estimate SE
## elpd_waic -311.8 12.2
## p_waic 55.1 4.8
## waic 623.6 24.4
##
## 41 (16.7%) p_waic estimates greater than 0.4. We recommend trying loo instead.
```

You might notice how that value is similar to the AIC and BIC values for Model A in Table 4.1. But it’s not identical and we shouldn’t expect it to be. It was computed by a different formula that accounts for priors. For our purposes, this is much better than the frequentist AIC and BIC. We need statistics that can handle priors.

#### 4.6.4.2 Leave-one-out cross-validation (LOO-CV).

We have another big option for model comparison within the Stan ecosystem. It involves leave-one-out cross-validation (LOO-CV). It’s often the case that we aren’t just interested in modeling the data we have in hand. The hope is our findings would generalize to other data we could have collected or may collect in the future. We’d like our findings to tell us something more general about the world at large. But unless you’re studying something highly uniform like the weights of hydrogen atoms, chances are your data have idiosyncrasies that won’t generalize well to other data. Sure, if we had all the information on all the relevant variables, we could explain the discrepancies across samples with hidden moderators and such. But we don’t have all the data and we typically don’t even know what all the relevant variables are.

Welcome to science.

To address this problem, you might recommend we collect data from two samples for each project. Starting with sample A, we’d fit a series of models and settle on one or a small subset that both speak to our scientific hypothesis and seem to fit the sample A data well. Then we’d switch to sample B and rerun our primary model(s) from A to make sure our findings generalize. In this paradigm, we might call the A data *in sample* and the B data *out of sample*–or out of sample A, anyways.

The problem is we often have time and funding constraints. We only have sample A and we may never collect sample B. So we’ll need to make the most out of A. Happily, tricky statisticians have our back. Instead, what we might do is divide our data into \(k\) equally-sized subsets. Call those subsets *folds*. If we leave one of the folds out, we can fit the model with the remaining data and then see how well that model speaks to the left-out fold. After doing this for every fold, we can get an average performance across folds.

Note how as \(k\) increases, the number of cases with a fold get smaller. In the extreme, \(k = N\), the number of cases within the data. At that point, \(k\)-fold cross-validation turns into leave-one-out cross-validation (LOO-CV).

But there’s a practical difficulty with LOO-CV: it’s costly. As you may have noticed, it takes some time to fit a Bayesian multilevel model. For large data and/or complicated models, sometimes it takes hours or days. Most of us just don’t have enough time or computational resources to fit that many models. Happily, we have an approximation to pure LOO-CV. Vehtari, Gelman, and Gabry (2017) proposed Pareto smoothed importance-sampling leave-one-out cross-validation (PSIS-LOO) as an efficient way to approximate true LOO-CV. At this point, it’s probably best to let the statisticians speak for themselves:

To maintain comparability with the given dataset and to get easier interpretation of the differences in scale of effective number of parameters, we define a measure of predictive accuracy for the \(n\) data points taken one at a time:

\[\begin{align*} \text{elpd} & = \text{expected log pointwise predictive density for a new dataset} \\ & = \sum_{i = 1}^n \int p_t (\tilde{y}_i) \log p (\tilde{y}_i | y) d \tilde{y}_i, \end{align*}\]

where \(p_t (\tilde{y}_i)\) is the distribution representing the true data-generating process for \(\tilde{y}_i\). The \(p_t (\tilde{y}_i)\)’s are unknown, and we will use cross-validation or WAIC to approximate. In a regression, these distributions are also implicitly conditioned on any predictors in the model…

The Bayesian LOO estimate of out-of-sample predictive fit is

\[\text{elpd}_{\text{loo}} = \sum_{i = 1}^n \log p (y_i | y - _i),\]

where

\[p (y_i | y - _i) = \int p (y_i | \theta) p (\theta | y - _i) d \theta\]

is the leave-one-out predictive density given the data without the ith data point. (pp. 2–3)

For the rest of the details, check out the original paper. Our goal is to practice using the PSIS-LOO. Since this is the only version of the LOO we’ll be using in this project, I’m just going to refer to it as the LOO from here on. To use the LOO to evaluate a `brm()`

fit, you just use the `loo()`

function. Though you don’t have to save the results as an object, we’ll be forward thinking and do so here.

```
.1 <- loo(fit4.1)
l_fit4
print(l_fit4.1)
```

```
##
## Computed from 4000 by 246 log-likelihood matrix
##
## Estimate SE
## elpd_loo -314.9 12.4
## p_loo 58.2 5.1
## looic 629.8 24.9
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 225 91.5% 480
## (0.5, 0.7] (ok) 19 7.7% 116
## (0.7, 1] (bad) 2 0.8% 148
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
```

Remember that warning message we got from the `waic()`

a while back? We get more information along those lines from the `loo()`

. As it turns out, a few of the cases in the data were unduly influential in the model fit. Within the `loo()`

paradigm, those are indexed by the `pareto_k`

values. As it turns out, the Pareto \(k\) can be used as a diagnostic tool. Each case in the data gets its own \(k\) value and we like it when those \(k\)’s are low. We typically get worried when those \(k\)’s exceed 0.7 and the `loo()`

function spits out a warning message when they do.

If you didn’t know, the **brms** functions like the `waic()`

and `loo()`

actually come from the **loo** package (Vehtari et al., 2017; Vehtari, Gabry, et al., 2019; Yao et al., 2018). Explicitly loading **loo** will buy us some handy convenience functions.

`library(loo)`

We’ll be leveraging those \(k\) values with the `pareto_k_table()`

and `pareto_k_ids()`

functions. Both functions take objects created by the `loo()`

or `psis()`

functions. Let’s take a look at the `pareto_k_table()`

function first.

`pareto_k_table(l_fit4.1) `

```
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 225 91.5% 480
## (0.5, 0.7] (ok) 19 7.7% 116
## (0.7, 1] (bad) 2 0.8% 148
## (1, Inf) (very bad) 0 0.0% <NA>
```

This is the same table that popped out earlier after using the `loo()`

. Recall that this data set has 246 observations (i.e., execute `count(alcohol1_pp)`

). With `pareto_k_table()`

, we see how the Pareto \(k\) values have been categorized into bins ranging from “good” to “very bad.” Clearly, we like nice and low \(k\)’s. In this example, most of our observations are “good” or “ok.” Two are in the “bad” \(k\) range. We can take a closer look by placing our `loo()`

object into `plot()`

.

`plot(l_fit4.1)`

We got back a nice diagnostic plot for those \(k\) values, ordered by row number. We can see that our three observations with the “bad” \(k\) values were earlier in the data and it appears their \(k\) values are just a smidge above the recommended threshold. If we wanted to further verify to ourselves which observations those were, we’d use the `pareto_k_ids()`

function.

`pareto_k_ids(l_fit4.1, threshold = .7)`

`## [1] 27 229`

Note our use of the `threshold`

argument. Play around with it to see how it works. In case you’re curious, here are those rows.

`c(27, 229), ] alcohol1_pp[`

```
## # A tibble: 2 x 9
## id age coa male age_14 alcuse peer cpeer ccoa
## <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 9 16 1 1 2 3.46 0 -1.02 0.549
## 2 77 14 0 1 0 0 0.894 -0.124 -0.451
```

If you want an explicit look at those \(k\) values, execute `l_fit4.1$diagnostics$pareto_k`

. For the sake of space, I’m going to omit the output.

`.1$diagnostics$pareto_k l_fit4`

The `pareto_k`

values can be used to examine cases that are overly-influential on the model parameters, something like a Cook’s \(D_i\). See, for example this discussion on stackoverflow.com in which several members of the Stan team weighed in. The issue is also discussed in Vehtari et al. (2017) and in this presentation by Aki Vehtari.

Anyway, the implication of all this is these values suggest `fit4.1`

(i.e., Model A) might not be the best model of the data. Happily, we have other models to compare it to. That leads into the next section:

#### 4.6.4.3 You can compare Bayesian models with the WAIC and LOO.

Remember how we used the `add_criterion()`

function, above. That’ll work for both WAIC and the LOO. Let’s do that for Models A through E.

```
.1 <- add_criterion(fit4.1, c("loo", "waic"))
fit4.2 <- add_criterion(fit4.2, c("loo", "waic"))
fit4.3 <- add_criterion(fit4.3, c("loo", "waic"))
fit4.5 <- add_criterion(fit4.5, c("loo", "waic")) fit4
```

And to refresh, we can pull the WAIC and LOO information with `$`

indexing. Here’s how to get the LOO info for `fit4.2`

.

`.2$criteria$loo fit4`

```
##
## Computed from 4000 by 246 log-likelihood matrix
##
## Estimate SE
## elpd_loo -290.5 12.7
## p_loo 95.5 7.6
## looic 581.1 25.4
## ------
## Monte Carlo SE of elpd_loo is NA.
##
## Pareto k diagnostic values:
## Count Pct. Min. n_eff
## (-Inf, 0.5] (good) 107 43.5% 156
## (0.5, 0.7] (ok) 110 44.7% 46
## (0.7, 1] (bad) 29 11.8% 9
## (1, Inf) (very bad) 0 0.0% <NA>
## See help('pareto-k-diagnostic') for details.
```

Sigh. Turns out there are even more overly-influential cases in the unconditional growth model. In the case of a real data analysis, this might suggest we need a more robust model. One possible solution might be switching out our Gaussian likelihood for the robust Student’s \(t\)-distribution. For an introduction, you might check out my blog post on the topic, *Robust Linear Regression with Student’s \(t\)-Distribution*. But that’ll take us farther afield than I want to go, right now.

The point to focus on, here, is we can use the `loo_compare()`

function to compare fits by their WAIC or LOO. Let’s practice with the WAIC.

```
<- loo_compare(fit4.1, fit4.2, fit4.3, fit4.5, criterion = "waic")
ws
print(ws)
```

```
## elpd_diff se_diff
## fit4.5 0.0 0.0
## fit4.3 -6.8 3.7
## fit4.2 -6.8 4.2
## fit4.1 -41.0 8.2
```

Remember how we said that some contemporary Bayesians aren’t fans of putting Bayesian information criteria in a \(\chi^2\) metric? Well, it turns out Aki Vehtari, of the Stan team and **loo** package fame–and also the primary author in that PSIS-LOO paper from before–, is one of those Bayesians. So instead of getting difference scores in the WAIC metric, we get them in the \(\text{elpd}_\text{WAIC}\) metric instead. But remember, if you prefer these estimates in the traditional metric, just multiply by -2.

```
cbind(waic_diff = ws[, 1] * -2,
se = ws[, 2] * 2)
```

```
## waic_diff se
## fit4.5 0.00000 0.000000
## fit4.3 13.55325 7.387772
## fit4.2 13.65795 8.484507
## fit4.1 81.92893 16.393535
```

The reason we multiplied the `se_diff`

column (i.e., the standard errors for the difference estimates) by 2 is because you can’t have negative standard errors. That’d be silly.

But anyway, notice that the `brm()`

fits have been rank ordered with the smallest differences at the top. Each row in the output is the difference of one of the fits compared to the best-fitting fit. Since `fit4.5`

apparently had the lowest WAIC value, it was ranked at the top. And notice how its `waic_diff`

is 0. That, of course, is because \(x - x = 0\). So all the other difference scores are follow the formula \(\text{Difference}_x = \text{WAIC}_\text{fit_x} - \text{WAIC}_\text{fit_4.5}\).

Concerning our `ws`

object, we can get more information on our models’ WAIC information if we include a `simplify = F`

argument within `print()`

.

`print(ws, simplify = F)`

```
## elpd_diff se_diff elpd_waic se_elpd_waic p_waic se_p_waic waic se_waic
## fit4.5 0.0 0.0 -270.8 11.1 73.5 5.6 541.7 22.2
## fit4.3 -6.8 3.7 -277.6 11.7 81.5 6.4 555.2 23.3
## fit4.2 -6.8 4.2 -277.7 11.7 82.6 6.5 555.3 23.5
## fit4.1 -41.0 8.2 -311.8 12.2 55.1 4.8 623.6 24.4
```

Their WAIC estimates and the associated standard errors are in the final two columns. In the two before that, we get the \(p_\text{WAIC}\) estimates and their standard errors. We can get similar information for the LOO.

```
loo_compare(fit4.1, fit4.2, fit4.3, fit4.5, criterion = "loo") %>%
print(simplify = F)
```

```
## elpd_diff se_diff elpd_loo se_elpd_loo p_loo se_p_loo looic se_looic
## fit4.5 0.0 0.0 -281.6 11.9 84.3 6.6 563.3 23.9
## fit4.2 -8.9 4.6 -290.5 12.7 95.5 7.6 581.1 25.4
## fit4.3 -9.2 3.9 -290.9 12.7 94.7 7.5 581.8 25.3
## fit4.1 -33.3 8.4 -314.9 12.4 58.2 5.1 629.8 24.9
```

If you wanted a more focused comparison, say between `fit1`

and `fit2`

, you’d just simplify your input.

```
loo_compare(fit4.1, fit4.2, criterion = "loo") %>%
print(simplify = F)
```

```
## elpd_diff se_diff elpd_loo se_elpd_loo p_loo se_p_loo looic se_looic
## fit4.2 0.0 0.0 -290.5 12.7 95.5 7.6 581.1 25.4
## fit4.1 -24.4 7.6 -314.9 12.4 58.2 5.1 629.8 24.9
```

We’ll get more practice with these methods as we go along. But for your own edification, you might check out the vignettes put out by the **loo** team.

## 4.7 Using Wald statistics to test composite hypotheses about fixed effects

I’m not going to address issues of composite null-hypothesis tests using the Wald statistic. However, we can address some of these issues from a different more estimation-based perspective. Consider the initial question posed on page 123:

Suppose, for example, you wanted to test whether the entire true change trajectory for a particular type of adolescent–say, a child of non-alcoholic parents with an average value of

PEER–differs from a “null” trajectory (one with zero intercept and zero slope). This is tantamount to asking whether the average child of non-alcoholic parents drinks no alcohol at age 14 and remains abstinent over time.

Singer and Willett then expressed their joint null hypothesis as

\[H_0: \gamma_{00} = 0 \; \text{and} \; \gamma_{10} = 0.\]

This is a substantive question we can address more informatively with `fitted()`

. First, let’s provide the necessary values for our predictor variables, `coa`

, `peer`

, and `age_14`

.

```
<- mean(alcohol1_pp$peer)
mu_peer
<-
nd tibble(coa = 0,
peer = mu_peer,
age_14 = seq(from = 0, to = 2, length.out = 30))
head(nd)
```

```
## # A tibble: 6 x 3
## coa peer age_14
## <dbl> <dbl> <dbl>
## 1 0 1.02 0
## 2 0 1.02 0.0690
## 3 0 1.02 0.138
## 4 0 1.02 0.207
## 5 0 1.02 0.276
## 6 0 1.02 0.345
```

Now we use `fitted()`

to examine the model-implied trajectory for a child of non-alcoholic parents and average `peer`

values.

```
<-
f fitted(fit4.6,
newdata = nd,
re_formula = NA) %>%
data.frame() %>%
bind_cols(nd) %>%
mutate(age = age_14 + 14)
%>%
f ggplot(aes(x = age)) +
geom_ribbon(aes(ymin = Q2.5, ymax = Q97.5),
size = 0, alpha = 1/4) +
geom_line(aes(y = Estimate)) +
scale_y_continuous("alcuse", breaks = 0:2, limits = c(0, 2)) +
labs(subtitle = "Zero is credible for neither\nthe intercept nor the slope.") +
coord_cartesian(xlim = c(13, 17)) +
theme(legend.position = "none",
panel.grid = element_blank())
```

Recall that the result of our Bayesian analyses are the probability of the parameters given the data, \(p(\theta | d)\). Based on our plot, there is much less than a .05 probability either intercept or slope for teens of this demographic are zero. If you really wanted to fixate on zero, you could even use `geom_hline()`

to insert a horizontal line at zero in the figure. But all that fixating on zero detracts from what to my mind are the more important parts of the model. The intercept at `age = 14`

is about 1/3 and the endpoint when `age = 16`

is almost at \(1\). Those are our effect sizes. If you wanted to quantify those effect sizes more precisely, just query our `fitted()`

object, `f`

.

```
%>%
f select(age, Estimate, Q2.5, Q97.5) %>%
filter(age %in% c(14, 16)) %>%
mutate_all(round, digits = 2)
```

```
## age Estimate Q2.5 Q97.5
## 1 14 0.39 0.18 0.6
## 2 16 0.93 0.66 1.2
```

Works like a champ. But we haven’t fully covered part of Singer and Willett’s joint hypothesis text. They proposed a joint Null that included the \(\gamma_{10} = 0\). Though it’s clear from the plot that the trajectory increases, we can address the issue more directly with a difference score. For our difference, we’ll subtract the estimate at `age = 14`

from the one at `age = 15`

. But to that, we’ll have to return to `fitted()`

. So far, we’ve been using the default output which returns summaries of the posterior. To compute a proper difference score, we’ll need to work with all the posterior draws in order to approximate the full distribution. We do that by setting `summary = F`

. And since we’re only interested in the estimates from these two `age`

values, we’ll streamline our `nd`

data.

```
<-
nd tibble(coa = 0,
peer = mu_peer,
age_14 = 0:1)
<-
f fitted(fit4.6,
newdata = nd,
re_formula = NA,
summary = F) %>%
data.frame()
str(f)
```

```
## 'data.frame': 4000 obs. of 2 variables:
## $ X1: num 0.487 0.389 0.313 0.473 0.411 ...
## $ X2: num 0.767 0.69 0.503 0.788 0.741 ...
```

Now our `f`

object has 4,000 rows and 2 columns. Each of the rows corresponds to one of the 4,000 post-warmup posterior draws. The columns correspond to the two rows in our `nd`

data. To get a slope based on this combination of predictor values, we simply subtract the first column from the second.

```
<-
f %>%
f transmute(difference = X2 - X1)
%>%
f ggplot(aes(x = difference)) +
geom_density(size = 0, fill = "grey25") +
scale_y_continuous(NULL, breaks = NULL) +
labs(subtitle = "Based on 4,000 posterior draws, not a single one\nsuggests the slope is even close to zero. Rather, the\nposterior mass is concentrated around 0.25.",
x = expression(paste(gamma[0][1], " (i.e., the difference between the two time points)"))) +
coord_cartesian(xlim = 0:1) +
theme(panel.grid = element_blank())
```

Here are the posterior mean and 95% intervals.

```
%>%
f summarise(mean = mean(difference),
ll = quantile(difference, probs = .025),
ul = quantile(difference, probs = .975))
```

```
## mean ll ul
## 1 0.2699514 0.1438935 0.3903433
```

On page 125, Singer and Willett further mused:

When we examined the OLS estimated change trajectories in figure 4.2, we noticed that among children of non-alcoholic parents, those with low values of

CPEERtended to have a lower initial status and steeper slopes than those with high values ofCPEER. We might therefore ask whether the former group “catches up” to the latter. This is a question about the “vertical” separation between these two groups[’] true change trajectories at some later age, say 16.

Within their joint hypothesis testing paradigm, they pose this as testing

\[H_0: 0\gamma_{00} + 0\gamma_{01} + 1\gamma_{02} + 0\gamma_{10} + 2\gamma_{12} = 0.\]

From our perspective, this is a differences of differences analysis. That is, first we’ll compute the model implied `alcuse`

estimates for the four combinations of the two levels of `age`

and `peer`

, holding `coa`

constant at 0. Second, we’ll compute the differences between the two `peer`

levels at each `age`

. Third and finally, we’ll compute a difference of those differences.

For our first step, recall it was `fit4.7`

that used the `cpeer`

variable.

```
# first step
<-
nd crossing(age_14 = c(0, 2),
cpeer = c(-.363, .363)) %>%
mutate(coa = 0)
head(nd)
```

```
## # A tibble: 4 x 3
## age_14 cpeer coa
## <dbl> <dbl> <dbl>
## 1 0 -0.363 0
## 2 0 0.363 0
## 3 2 -0.363 0
## 4 2 0.363 0
```

```
<-
f fitted(fit4.7,
newdata = nd,
re_formula = NA,
summary = F) %>%
data.frame()
head(f)
```

```
## X1 X2 X3 X4
## 1 0.00992433 0.6948407 0.7379298 1.0116666
## 2 0.13634339 0.7560264 0.7624353 1.1095622
## 3 0.24484096 0.7109585 0.8619811 1.2328534
## 4 0.17783401 0.7005683 0.8674297 1.2747157
## 5 0.23737032 0.7686043 0.7545304 0.9414084
## 6 0.20631357 0.6908925 0.8356393 1.0215177
```

For our initial difference scores, we’ll subtract the estimates for the lower level of `cpeer`

from the higher ones.

```
# step 2
<-
f %>%
f transmute(`difference at 14` = X2 - X1,
`difference at 16` = X4 - X3)
head(f)
```

```
## difference at 14 difference at 16
## 1 0.6849164 0.2737369
## 2 0.6196831 0.3471269
## 3 0.4661175 0.3708723
## 4 0.5227343 0.4072860
## 5 0.5312340 0.1868780
## 6 0.4845789 0.1858784
```

For our final difference score, we’ll subtract the first difference score from the second.

```
# step 3
<-
f %>%
f mutate(`difference in differences` = `difference at 16` - `difference at 14`)
head(f)
```

```
## difference at 14 difference at 16 difference in differences
## 1 0.6849164 0.2737369 -0.41117950
## 2 0.6196831 0.3471269 -0.27255618
## 3 0.4661175 0.3708723 -0.09524523
## 4 0.5227343 0.4072860 -0.11544834
## 5 0.5312340 0.1868780 -0.34435600
## 6 0.4845789 0.1858784 -0.29870052
```

Here we’ll plot all three.

```
%>%
f pivot_longer(everything()) %>%
ggplot(aes(x = value)) +
geom_density(size = 0, fill = "grey25") +
scale_y_continuous(NULL, breaks = NULL) +
xlab("different differences") +
theme(panel.grid = element_blank()) +
facet_wrap(~name, scales = "free_y")
```

Singer and Willett concluded they could “reject the null hypothesis at any conventional level of significance” (p. 126). If we must appeal to the Null, here are the posterior means and 95% intervals for our differences.

```
%>%
f pivot_longer(everything()) %>%
group_by(name) %>%
summarise(mean = mean(value),
ll = quantile(value, probs = .025),
ul = quantile(value, probs = .975)) %>%
mutate_if(is.double, round, digits = 3)
```

```
## # A tibble: 3 x 4
## name mean ll ul
## <chr> <dbl> <dbl> <dbl>
## 1 difference at 14 0.504 0.34 0.666
## 2 difference at 16 0.283 0.055 0.509
## 3 difference in differences -0.221 -0.471 0.026
```

Our results contrast a bit from Singer and Willett’s. Though the bulk of our posterior mass is concentrated around -0.22, zero is a credible value within the difference of differences density. Our best bet is the differences begin to converge over time. However, that rate of that convergence is subtle and somewhat imprecise relative to the effect size. Interpret with caution.

## 4.8 Evaluating the tenability of a model’s assumptions

“Whenever you fit a statistical model, you invoke assumptions” (p. 127). This is the case for multilevel Bayesian models, too.

### 4.8.1 Checking functional form.

We’ve already checked the functional form at level-1 with our version of Figure 4.1. When we made our version of Figure 4.1, we relied on `ggplot2::stat_smooth()`

to compute the `id`

-level OLS trajectories. To make our variants of Figure 4.4, we’ll have to back up and compute them externally with `lm()`

. Here we’ll do so in bulk with a nested data frame. The **broom** package will help us extract the results.

```
library(broom)
<-
o %>%
alcohol1_pp nest(-id, -coa, -peer) %>%
mutate(ols = map(data, ~lm(data = ., alcuse ~ 1 + age_14))) %>%
mutate(tidy = map(ols, tidy)) %>%
unnest(tidy) %>%
# this is unnecessary, but will help with plotting
mutate(term = factor(term,
levels = c("(Intercept)", "age_14"),
labels = c("pi[0]", "pi[1]")))
head(o)
```

```
## # A tibble: 6 x 10
## id coa peer data ols term estimate std.error statistic p.value
## <dbl> <dbl> <dbl> <list> <list> <fct> <dbl> <dbl> <dbl> <dbl>
## 1 1 1 1.26 <tibble [3 × 6]> <lm> pi[0] 1.78 0.0999 17.8 0.0357
## 2 1 1 1.26 <tibble [3 × 6]> <lm> pi[1] 0.134 0.0774 1.73 0.333
## 3 2 1 0.894 <tibble [3 × 6]> <lm> pi[0] -0.167 0.373 -0.447 0.732
## 4 2 1 0.894 <tibble [3 × 6]> <lm> pi[1] 0.5 0.289 1.73 0.333
## 5 3 1 0.894 <tibble [3 × 6]> <lm> pi[0] 0.947 0.118 8.03 0.0789
## 6 3 1 0.894 <tibble [3 × 6]> <lm> pi[1] 1.16 0.0914 12.7 0.0501
```

Now plot.

```
%>%
o select(coa:peer, term:estimate) %>%
pivot_longer(coa:peer) %>%
ggplot(aes(x = value, y = estimate)) +
geom_hline(yintercept = 0, color = "white") +
geom_point(alpha = 2/3) +
theme(panel.grid = element_blank(),
strip.text = element_text(size = 11)) +
facet_grid(term~name, scales = "free", labeller = label_parsed)
```

With a little more wrangling, we can extract the Pearson’s correlation coefficients for each panel.

```
%>%
o select(coa:peer, term:estimate) %>%
pivot_longer(coa:peer) %>%
group_by(term, name) %>%
nest() %>%
mutate(r = map_dbl(data, ~cor(.)[2, 1] %>% round(digits = 2)))
```

```
## # A tibble: 4 x 4
## # Groups: term, name [4]
## term name data r
## <fct> <chr> <list> <dbl>
## 1 pi[0] coa <tibble [82 × 2]> 0.39
## 2 pi[0] peer <tibble [82 × 2]> 0.580
## 3 pi[1] coa <tibble [82 × 2]> -0.04
## 4 pi[1] peer <tibble [82 × 2]> -0.19
```

### 4.8.2 Checking normality.

The basic multilevel model of change yields three variance parameters, \(\epsilon_{ij}\), \(\zeta_{0i}\), and \(\zeta_{1i}\). Each measurement occasion in the model receives a model-implied estimate for each. Singer and Willett referred to those estimates as \(\hat{\epsilon}_{ij}\), \(\hat{\zeta}_{0i}\), and \(\hat{\zeta}_{1i}\). As with frequentist software, our Bayesian software **brms** will return these estimates.

To extract our Bayesian draws for the \(\hat{\epsilon}_{ij}\)’s, we use the `residuals()`

function.

```
<- residuals(fit4.6)
e
str(e)
```

```
## num [1:246, 1:4] 0.2837 0.2488 -0.0541 -0.3956 -0.5964 ...
## - attr(*, "dimnames")=List of 2
## ..$ : NULL
## ..$ : chr [1:4] "Estimate" "Est.Error" "Q2.5" "Q97.5"
```

`head(e)`

```
## Estimate Est.Error Q2.5 Q97.5
## [1,] 0.28373687 0.3300760 -0.4185307 0.920251089
## [2,] 0.24881275 0.3009132 -0.3481427 0.849632979
## [3,] -0.05406059 0.4344582 -0.8915095 0.772798204
## [4,] -0.39558137 0.3450541 -1.0184425 0.326134254
## [5,] -0.59640144 0.3066759 -1.1850149 0.006306177
## [6,] 0.20277848 0.4533126 -0.6902499 1.098015341
```

For our `fit5`

, the `residuals()`

function returned a \(246 \times 4\) numeric array. Each row corresponded to one of the rows of the original data set. The four vectors are the familiar summaries `Estimate`

, `Est.Error`

, `Q2.5`

, and `Q97.5`

. If we’d like to work with these in a **ggplot2**-made plot, we’ll have to convert our `e`

object to a data frame.

After we make the conversion, we then make the top left panel of Figure 4.5.

```
<-
e %>%
e data.frame()
%>%
e ggplot(aes(sample = Estimate)) +
geom_hline(yintercept = 0, color = "white") +
geom_qq() +
ylim(-2, 2) +
labs(x = "Normal score",
y = expression(hat(epsilon)[italic(ij)])) +
theme(panel.grid = element_blank())
```

For the right plot on the top, we need to add an `id`

index. That’s as easy as appending the one from the original data. If you followed closely with the text, you may have also noticed this panel is of the standardized residuals. That just means we’ll have to hand-standardize ours before plotting.

```
%>%
e bind_cols(alcohol1_pp %>% select(id)) %>%
mutate(z = (Estimate - mean(Estimate)) / sd(Estimate)) %>%
ggplot(aes(x = id, y = z)) +
geom_hline(yintercept = 0, color = "white") +
geom_point() +
scale_y_continuous(expression(italic(std)~hat(epsilon)[italic(ij)]), limits = c(-2, 2)) +
theme(panel.grid = element_blank())
```

We’ll need to use the `ranef()`

function to return the estimates for the \(\zeta\)’s.

```
<- ranef(fit4.6)
z
str(z)
```

```
## List of 1
## $ id: num [1:82, 1:4, 1:2] 0.309 -0.485 0.33 -0.367 -0.563 ...
## ..- attr(*, "dimnames")=List of 3
## .. ..$ : chr [1:82] "1" "2" "3" "4" ...
## .. ..$ : chr [1:4] "Estimate" "Est.Error" "Q2.5" "Q97.5"
## .. ..$ : chr [1:2] "Intercept" "age_14"
```

`1]][1:6, , "Intercept"] z[[`

```
## Estimate Est.Error Q2.5 Q97.5
## 1 0.3091684 0.3290936 -0.3218206 0.99439914
## 2 -0.4853620 0.3432303 -1.1918178 0.12254696
## 3 0.3297689 0.3345132 -0.3378441 0.99601895
## 4 -0.3670263 0.3559662 -1.1386357 0.27133749
## 5 -0.5633450 0.3328465 -1.2421447 0.07577136
## 6 0.8312605 0.3630722 0.1813145 1.57052650
```

`1]][1:6, , "age_14"] z[[`

```
## Estimate Est.Error Q2.5 Q97.5
## 1 0.07022920 0.2429429 -0.43362902 0.5402339
## 2 -0.08770292 0.2581588 -0.59347433 0.4310441
## 3 0.45532257 0.2627952 -0.02425597 0.9979194
## 4 0.16111199 0.2679601 -0.31403103 0.7194043
## 5 -0.31372182 0.2463843 -0.79775604 0.1819019
## 6 0.24741360 0.2584950 -0.28084090 0.7306963
```

For our `fit5`

, the `ranef()`

function returned a list of 1, indexed by `id`

. Therein lay a 3-dimensional array. The first two dimensions are the same as what we got from `residuals()`

, above. The third dimension had two levels: `Intercept`

and `age_14`

. In other words, the third dimension is the one that differentiated between \(\hat{\zeta}_{0i}\) and \(\hat{\zeta}_{1i}\). to make this thing a little more useful, let’s convert it to a long-formatted data frame.

```
<-
z rbind(z[[1]][ , , "Intercept"],
1]][ , , "age_14"]) %>%
z[[data.frame() %>%
mutate(ranef = rep(c("hat(zeta)[0][italic(i)]", "hat(zeta)[1][italic(i)]"), each = n() / 2))
glimpse(z)
```

```
## Rows: 164
## Columns: 5
## $ Estimate <dbl> 0.30916842, -0.48536203, 0.32976885, -0.36702629, -0.56334498, 0.83126048, 0.224…
## $ Est.Error <dbl> 0.3290936, 0.3432303, 0.3345132, 0.3559662, 0.3328465, 0.3630722, 0.3424155, 0.3…
## $ Q2.5 <dbl> -0.32182062, -1.19181776, -0.33784412, -1.13863572, -1.24214475, 0.18131446, -0.…
## $ Q97.5 <dbl> 0.99439914, 0.12254696, 0.99601895, 0.27133749, 0.07577136, 1.57052650, 0.958441…
## $ ranef <chr> "hat(zeta)[0][italic(i)]", "hat(zeta)[0][italic(i)]", "hat(zeta)[0][italic(i)]",…
```

Now we’re ready to plot the remaining panels on the left of Figure 4.5.

```
%>%
z ggplot(aes(sample = Estimate)) +
geom_hline(yintercept = 0, color = "white") +
geom_qq() +
ylim(-1, 1) +
labs(x = "Normal score",
y = NULL) +
theme(panel.grid = element_blank()) +
facet_wrap(~ranef, labeller = label_parsed, ncol = 1)
```

Here are the ones on the right.

```
%>%
z bind_cols(
bind_rows(
%>% distinct(id),
alcohol1_pp %>% distinct(id)
alcohol1_pp
)%>%
) mutate(ranef = str_c("italic(std)~", ranef)) %>%
# note we have to group them before standardizing
group_by(ranef) %>%
mutate(z = (Estimate - mean(Estimate)) / sd(Estimate)) %>%
ggplot(aes(x = id, y = z)) +
geom_hline(yintercept = 0, color = "white") +
geom_point() +
scale_y_continuous(NULL, limits = c(-3, 3)) +
theme(panel.grid = element_blank()) +
facet_wrap(~ranef, labeller = label_parsed, ncol = 1)
```

If you were paying close attention, you may have noticed that for all three of our `id`

-level deviation estimates, they were summarized not only by a posterior mean but by standard deviations and 95% intervals, too. To give a sense of what that means, here are those last two plots, again, but this time including vertical bars defined by the 95% intervals.

```
%>%
z bind_cols(
bind_rows(
%>% distinct(id),
alcohol1_pp %>% distinct(id)
alcohol1_pp
)%>%
)
ggplot(aes(x = id, y = Estimate, ymin = Q2.5, ymax = Q97.5)) +
geom_hline(yintercept = 0, color = "white") +
geom_pointrange(shape = 20, size = 1/3) +
ylab(NULL) +
theme(panel.grid = element_blank()) +
facet_wrap(~ranef, labeller = label_parsed, ncol = 1)
```

When you go Bayesian, even your residuals get full posterior distributions.

### 4.8.3 Checking homoscedasticity.

Here we examine the homoscedasticity assumption by plotting the residual estimates against our predictors. We’ll start with the upper left panel of Figure 4.6.

```
%>%
e bind_cols(alcohol1_pp) %>%
ggplot(aes(x = age, y = Estimate)) +
geom_hline(yintercept = 0, color = "white") +
geom_point(alpha = 1/4) +
ylab(expression(hat(epsilon)[italic(ij)])) +
coord_cartesian(xlim = c(13, 17),
ylim = c(-2, 2)) +
theme(panel.grid = element_blank())
```

Here’s a quick way to get the remaining four panels.

```
%>%
z bind_cols(
bind_rows(
%>% distinct(id, coa, peer),
alcohol1_pp %>% distinct(id, coa, peer)
alcohol1_pp
)%>%
) select(Estimate, ranef, coa, peer) %>%
pivot_longer(-c(Estimate, ranef)) %>%
ggplot(aes(x = value, y = Estimate)) +
geom_hline(yintercept = 0, color = "white") +
geom_point(alpha = 1/3) +
ylim(-1, 1) +
labs(x = "covariate value",
y = NULL) +
theme(panel.grid = element_blank(),
strip.text = element_text(size = 10)) +
facet_grid(ranef~name, labeller = label_parsed, scales = "free")
```

## 4.9 Model-based (Empirical Bayes) estimates of the individual growth parameters

In this section, the authors discussed two methods for constructing `id`

-level trajectories: a) use a weighted average of the OLS and multilevel estimates and b) rely solely on the multilevel model by making use of the three sources of residual variation. Our method will be the latter.

Here are the data for `id == 23`

.

```
%>%
alcohol1_pp select(id:coa, cpeer, alcuse) %>%
filter(id == 23)
```

```
## # A tibble: 3 x 5
## id age coa cpeer alcuse
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 23 14 1 -1.02 1
## 2 23 15 1 -1.02 1
## 3 23 16 1 -1.02 1.73
```

```
<-
post_23 posterior_samples(fit4.7) %>%
select(starts_with("b_")) %>%
# make our pis
mutate(`pi[0][",23"]` = b_Intercept + b_coa * 1 + b_cpeer * -1.018,
`pi[1][",23"]` = b_age_14 + `b_age_14:cpeer` * -1.018)
head(post_23)
```

```
## b_Intercept b_age_14 b_coa b_cpeer b_age_14:cpeer pi[0][",23"] pi[1][",23"]
## 1 0.3523825 0.2612078 0.4530122 0.9434110 -0.28318148 -0.15499770 0.5494866
## 2 0.4461849 0.2449069 0.4672898 0.8535579 -0.18771087 0.04455272 0.4359966
## 3 0.4778997 0.2847588 0.5093376 0.6420351 -0.06559589 0.33364551 0.3515354
## 4 0.4392012 0.3159357 0.5000807 0.7200197 -0.07950987 0.20630183 0.3968768
## 5 0.5029873 0.1724910 0.2822409 0.7317273 -0.23715978 0.04032987 0.4139197
## 6 0.4486030 0.2399877 0.3343177 0.6674641 -0.20571661 0.10344228 0.4494073
```

It doesn’t help us much now, but the reason we’ve formatted the names for our two \(\pi\) columns so oddly is because those names will work much nicer in the figure we’ll make, below. Just wait and see.

Anyways, more than a couple point estimates, we returned the draws from the full posterior distribution. We might summarize them.

```
%>%
post_23 pivot_longer(starts_with("pi")) %>%
group_by(name) %>%
summarise(mean = mean(value),
ll = quantile(value, probs = .025),
ul = quantile(value, probs = .975)) %>%
mutate_if(is.double, round, digits = 3)
```

```
## # A tibble: 2 x 4
## name mean ll ul
## <chr> <dbl> <dbl> <dbl>
## 1 "pi[0][\",23\"]" 0.259 -0.072 0.606
## 2 "pi[1][\",23\"]" 0.425 0.207 0.638
```

Or we could plot them.

```
%>%
post_23 pivot_longer(starts_with("pi")) %>%
ggplot(aes(x = value)) +
geom_density(size = 0, fill = "grey25") +
scale_y_continuous(NULL, breaks = NULL) +
xlab("participant-specific parameter estimates") +
theme(panel.grid = element_blank()) +
facet_wrap(~name, labeller = label_parsed, scales = "free_y")
```

Yet this approach neglects the \(\zeta\)’s. W’ve been extracting the \(\zeta\)’s with `ranef()`

. We also get them when we use `posterior_samples()`

. Here we’ll extract both the \(\gamma\)’s as well as the \(\zeta\)’s for `id == 23`

.

```
<-
post_23 posterior_samples(fit4.7) %>%
select(starts_with("b_"), contains("23"))
glimpse(post_23)
```

```
## Rows: 4,000
## Columns: 7
## $ b_Intercept <dbl> 0.3523825, 0.4461849, 0.4778997, 0.4392012, 0.5029873, 0.4486030, 0.4…
## $ b_age_14 <dbl> 0.2612078, 0.2449069, 0.2847588, 0.3159357, 0.1724910, 0.2399877, 0.1…
## $ b_coa <dbl> 0.4530122, 0.4672898, 0.5093376, 0.5000807, 0.2822409, 0.3343177, 0.4…
## $ b_cpeer <dbl> 0.9434110, 0.8535579, 0.6420351, 0.7200197, 0.7317273, 0.6674641, 0.8…
## $ `b_age_14:cpeer` <dbl> -0.283181475, -0.187710867, -0.065595888, -0.079509873, -0.237159783,…
## $ `r_id[23,Intercept]` <dbl> 0.37595666, 0.14315079, 0.13486424, 0.13912766, 0.52783700, 0.2523174…
## $ `r_id[23,age_14]` <dbl> 0.157401993, 0.054495658, 0.103729538, 0.333214663, 0.264059431, 0.21…
```

With the `r_id`

prefix, **brms** tells you these are residual estimates for the levels in the `id`

grouping variable. Within the brackets, we learn these particular columns are for `id == 23`

, the first with respect to the `Intercept`

and second with respect to the `age_14`

parameter. Let’s put them to use.

```
<-
post_23 %>%
post_23 mutate(`pi[0][",23"]` = b_Intercept + b_coa * 1 + b_cpeer * -1.018 + `r_id[23,Intercept]`,
`pi[1][",23"]` = b_age_14 + `b_age_14:cpeer` * -1.018 + `r_id[23,age_14]`)
glimpse(post_23)
```

```
## Rows: 4,000
## Columns: 9
## $ b_Intercept <dbl> 0.3523825, 0.4461849, 0.4778997, 0.4392012, 0.5029873, 0.4486030, 0.4…
## $ b_age_14 <dbl> 0.2612078, 0.2449069, 0.2847588, 0.3159357, 0.1724910, 0.2399877, 0.1…
## $ b_coa <dbl> 0.4530122, 0.4672898, 0.5093376, 0.5000807, 0.2822409, 0.3343177, 0.4…
## $ b_cpeer <dbl> 0.9434110, 0.8535579, 0.6420351, 0.7200197, 0.7317273, 0.6674641, 0.8…
## $ `b_age_14:cpeer` <dbl> -0.283181475, -0.187710867, -0.065595888, -0.079509873, -0.237159783,…
## $ `r_id[23,Intercept]` <dbl> 0.37595666, 0.14315079, 0.13486424, 0.13912766, 0.52783700, 0.2523174…
## $ `r_id[23,age_14]` <dbl> 0.157401993, 0.054495658, 0.103729538, 0.333214663, 0.264059431, 0.21…
## $ `pi[0][",23"]` <dbl> 0.22095896, 0.18770352, 0.46850975, 0.34542949, 0.56816687, 0.3557597…
## $ `pi[1][",23"]` <dbl> 0.70688858, 0.49049224, 0.45526491, 0.73009146, 0.67797911, 0.6647450…
```

Here are our updated summaries.

```
%>%
post_23 pivot_longer(starts_with("pi")) %>%
group_by(name) %>%
summarise(mean = mean(value),
ll = quantile(value, probs = .025),
ul = quantile(value, probs = .975)) %>%
mutate_if(is.double, round, digits = 3)
```

```
## # A tibble: 2 x 4
## name mean ll ul
## <chr> <dbl> <dbl> <dbl>
## 1 "pi[0][\",23\"]" 0.573 -0.064 1.26
## 2 "pi[1][\",23\"]" 0.51 -0.01 0.996
```

And here are the updated density plots.

```
%>%
post_23 pivot_longer(starts_with("pi")) %>%
ggplot(aes(x = value)) +
geom_density(size = 0, fill = "grey25") +
scale_y_continuous(NULL, breaks = NULL) +
xlab("participant-specific parameter estimates") +
theme(panel.grid = element_blank()) +
facet_wrap(~name, labeller = label_parsed, scales = "free_y")
```

We’ve been focusing on the \(\pi\) parameters. Notice that when we turn our attention to Figure 4.7, we’re now shifting focus slightly to the consequences of those parameters. We’re not attending to trajectories. It’s important to pick up on this distinction because it has consequences for our programming workflow. If you wanted to keep a parameter-centric workflow, we could continue to expand on our `posterior_samples()`

by applying the full composite formula to explicitly add in predictions for various levels of `age_14`

. And we could do that separately or in bulk for the eight participants highlighted in the figure.

However pedagogically useful that might be, it’d be very tedious. If we instead take a trajectory-centric perspective, it’ll be more natural and efficient to work with a `fitted()`

-based workflow. Let’s define our `nd`

data.

```
<-
nd %>%
alcohol1_pp select(id:coa, age_14:alcuse, cpeer) %>%
filter(id %in% c(4, 14, 23, 32, 41, 56, 65, 82)) %>%
# these next two lines will make plotting easier
mutate(id_label = ifelse(id < 10, str_c("0", id), id)) %>%
mutate(id_label = str_c("id = ", id_label))
glimpse(nd)
```

```
## Rows: 24
## Columns: 7
## $ id <dbl> 4, 4, 4, 14, 14, 14, 23, 23, 23, 32, 32, 32, 41, 41, 41, 56, 56, 56, 65, 65, 65, …
## $ age <dbl> 14, 15, 16, 14, 15, 16, 14, 15, 16, 14, 15, 16, 14, 15, 16, 14, 15, 16, 14, 15, 1…
## $ coa <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ age_14 <dbl> 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2
## $ alcuse <dbl> 0.000000, 2.000000, 1.732051, 2.828427, 3.605551, 2.828427, 1.000000, 1.000000, 1…
## $ cpeer <dbl> 0.7708544, 0.7708544, 0.7708544, 0.9820000, 0.9820000, 0.9820000, -1.0180000, -1.…
## $ id_label <chr> "id = 04", "id = 04", "id = 04", "id = 14", "id = 14", "id = 14", "id = 23", "id …
```

We’ve isolated the relevant predictor variables for our eight focal participants. Next we’ll pump them through `fitted()`

and wrangle as usual.

```
<-
f fitted(fit4.7,
newdata = nd) %>%
data.frame() %>%
bind_cols(nd)
glimpse(f)
```

```
## Rows: 24
## Columns: 11
## $ Estimate <dbl> 1.1319324, 1.4437820, 1.7556316, 2.3761707, 2.6937915, 3.0114123, 0.5732711, 1.0…
## $ Est.Error <dbl> 0.3577853, 0.3077347, 0.4528754, 0.3705983, 0.3143068, 0.4559860, 0.3378569, 0.2…
## $ Q2.5 <dbl> 0.39665637, 0.84234417, 0.85411503, 1.69396153, 2.06601789, 2.14739227, -0.06413…
## $ Q97.5 <dbl> 1.775991, 2.053892, 2.651391, 3.130042, 3.293335, 3.902585, 1.258433, 1.662780, …
## $ id <dbl> 4, 4, 4, 14, 14, 14, 23, 23, 23, 32, 32, 32, 41, 41, 41, 56, 56, 56, 65, 65, 65,…
## $ age <dbl> 14, 15, 16, 14, 15, 16, 14, 15, 16, 14, 15, 16, 14, 15, 16, 14, 15, 16, 14, 15, …
## $ coa <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ age_14 <dbl> 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2
## $ alcuse <dbl> 0.000000, 2.000000, 1.732051, 2.828427, 3.605551, 2.828427, 1.000000, 1.000000, …
## $ cpeer <dbl> 0.7708544, 0.7708544, 0.7708544, 0.9820000, 0.9820000, 0.9820000, -1.0180000, -1…
## $ id_label <chr> "id = 04", "id = 04", "id = 04", "id = 14", "id = 14", "id = 14", "id = 23", "id…
```

Notice how this time we omitted the `re_formula = NA`

argument. By default, `re_formula = NULL`

, the consequence of which is the output is based on all the parameters in the multilevel model, not just the \(\gamma\)’s. Here are what they look like.

```
%>%
f ggplot(aes(x = age, y = Estimate)) +
geom_line(size = 1) +
scale_y_continuous("alcuse", breaks = 0:4, limits = c(-1, 4)) +
xlim(13, 17) +
theme(legend.position = "none",
panel.grid = element_blank()) +
facet_wrap(~id_label, ncol = 4)
```

Now we’ve warmed up, let’s add in the data and the other lines so make the full version of Figure 4.7. Before we do so, we’ll revisit `fitted()`

. Notice the return of the `re_formula = NA`

argument. The trajectories in our `f_gamma_only`

data frame will only be sensitive to the \(\gamma\)s.

```
<-
f_gamma_only fitted(fit4.7,
newdata = nd,
re_formula = NA) %>%
data.frame() %>%
bind_cols(nd)
glimpse(f_gamma_only)
```

```
## Rows: 24
## Columns: 11
## $ Estimate <dbl> 1.4999323, 1.6525461, 1.8051599, 1.6463847, 1.7669051, 1.8874254, 0.2591678, 0.6…
## $ Est.Error <dbl> 0.1391460, 0.1353891, 0.1827906, 0.1539855, 0.1489680, 0.2054983, 0.1739917, 0.1…
## $ Q2.5 <dbl> 1.22575777, 1.38896091, 1.45621733, 1.34111869, 1.47902735, 1.49699488, -0.07245…
## $ Q97.5 <dbl> 1.7645873, 1.9147073, 2.1649615, 1.9464004, 2.0551021, 2.2881997, 0.6056240, 1.0…
## $ id <dbl> 4, 4, 4, 14, 14, 14, 23, 23, 23, 32, 32, 32, 41, 41, 41, 56, 56, 56, 65, 65, 65,…
## $ age <dbl> 14, 15, 16, 14, 15, 16, 14, 15, 16, 14, 15, 16, 14, 15, 16, 14, 15, 16, 14, 15, …
## $ coa <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
## $ age_14 <dbl> 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2, 0, 1, 2
## $ alcuse <dbl> 0.000000, 2.000000, 1.732051, 2.828427, 3.605551, 2.828427, 1.000000, 1.000000, …
## $ cpeer <dbl> 0.7708544, 0.7708544, 0.7708544, 0.9820000, 0.9820000, 0.9820000, -1.0180000, -1…
## $ id_label <chr> "id = 04", "id = 04", "id = 04", "id = 14", "id = 14", "id = 14", "id = 23", "id…
```

Let’s plot!

```
%>%
f ggplot(aes(x = age)) +
# `id`-specific lines
geom_line(aes(y = Estimate),
size = 1) +
# gamma-centric lines
geom_line(data = f_gamma_only,
aes(y = Estimate),
size = 1/2) +
# OLS lines
stat_smooth(data = nd,
aes(y = alcuse),
method = "lm", se = F,
color = "black", linetype = 2, size = 1/2) +
# data points
geom_point(data = nd,
aes(y = alcuse)) +
scale_y_continuous("alcuse", breaks = 0:4, limits = c(-1, 4)) +
xlim(13, 17) +
theme(legend.position = "none",
panel.grid = element_blank()) +
facet_wrap(~id_label, ncol = 4)
```

Though our purpose was largely to reproduce Figure 4.7, we might push ourselves a little further. Our Bayesian estimates came with measures of uncertainty, the posterior standard deviations and the 95% intervals. Whenever possible, it’s good form to include some expression of our uncertainty in our plots. Here let’s focus on the `id`

-specific trajectories.

```
%>%
f ggplot(aes(x = age, y = Estimate)) +
# `id`-specific 95% intervals
geom_ribbon(aes(ymin = Q2.5, ymax = Q97.5),
fill = "grey75") +
# `id`-specific lines
geom_line(size = 1) +
# data points
geom_point(data = nd,
aes(y = alcuse)) +
scale_y_continuous("alcuse", breaks = 0:4, limits = c(-1, 4)) +
xlim(13, 17) +
theme(legend.position = "none",
panel.grid = element_blank()) +
facet_wrap(~id_label, ncol = 4)
```

This also clarifies an important visualization point. If you only care about plotting straight lines, you only need two points. However, if you want to express shapes with curves, such as the typically-bowtie-shaped 95% intervals, you need estimates over a larger number of predictor values. Back to `fitted()`

!

```
# we need an expanded version of the `nd`
<-
nd_expanded %>%
alcohol1_pp select(id, coa, cpeer) %>%
filter(id %in% c(4, 14, 23, 32, 41, 56, 65, 82)) %>%
# this part is important!
expand(nesting(id, coa, cpeer),
age_14 = seq(from = 0, to = 2, length.out = 30)) %>%
mutate(id_label = ifelse(id < 10, str_c("0", id), id)) %>%
mutate(id_label = str_c("id = ", id_label),
age = age_14 + 14)
# pump our `nd_expanded` into `fitted()`
<-
f fitted(fit4.7,
newdata = nd_expanded) %>%
data.frame() %>%
bind_cols(nd_expanded)
glimpse(f)
```

```
## Rows: 240
## Columns: 10
## $ Estimate <dbl> 1.131932, 1.153439, 1.174946, 1.196453, 1.217960, 1.239467, 1.260974, 1.282480, …
## $ Est.Error <dbl> 0.3577853, 0.3479936, 0.3389280, 0.3306484, 0.3232152, 0.3166879, 0.3111236, 0.3…
## $ Q2.5 <dbl> 0.3966564, 0.4369660, 0.4785531, 0.5258237, 0.5647610, 0.6037199, 0.6487955, 0.6…
## $ Q97.5 <dbl> 1.775991, 1.783496, 1.799556, 1.810161, 1.822221, 1.833439, 1.851313, 1.866417, …
## $ id <dbl> 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4,…
## $ coa <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ cpeer <dbl> 0.7708544, 0.7708544, 0.7708544, 0.7708544, 0.7708544, 0.7708544, 0.7708544, 0.7…
## $ age_14 <dbl> 0.00000000, 0.06896552, 0.13793103, 0.20689655, 0.27586207, 0.34482759, 0.413793…
## $ id_label <chr> "id = 04", "id = 04", "id = 04", "id = 04", "id = 04", "id = 04", "id = 04", "id…
## $ age <dbl> 14.00000, 14.06897, 14.13793, 14.20690, 14.27586, 14.34483, 14.41379, 14.48276, …
```

Notice how we now have many more rows. Let’s plot.

```
%>%
f ggplot(aes(x = age, y = Estimate)) +
# `id`-specific 95% intervals
geom_ribbon(aes(ymin = Q2.5, ymax = Q97.5),
fill = "grey75") +
# `id`-specific lines
geom_line(size = 1) +
# data points
geom_point(data = nd,
aes(y = alcuse)) +
scale_y_continuous("alcuse", breaks = 0:4, limits = c(-1, 4)) +
xlim(13, 17) +
theme(legend.position = "none",
panel.grid = element_blank()) +
facet_wrap(~id_label, ncol = 4)
```

Singer and Willett pointed out that one of the ways in which the multilevel model is more parsimonious than a series of `id`

-specific single-level models is that all `id`

levels share the same \(\sigma_\epsilon\) parameter. At this point, we should just point out that it’s possible to relax this assumption with modern Bayesian software, such as **brms**. For ideas on how, check out Donald Williams’ work (e.g., Williams et al., 2019).

## Session info

`sessionInfo()`

```
## R version 4.0.4 (2021-02-15)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.0/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] broom_0.7.5 loo_2.4.1 brms_2.15.0 Rcpp_1.0.6 forcats_0.5.1 stringr_1.4.0
## [7] dplyr_1.0.5 purrr_0.3.4 readr_1.4.0 tidyr_1.1.3 tibble_3.1.0 ggplot2_3.3.3
## [13] tidyverse_1.3.0
##
## loaded via a namespace (and not attached):
## [1] readxl_1.3.1 backports_1.2.1 metRology_0.9-28-1 plyr_1.8.6
## [5] igraph_1.2.6 splines_4.0.4 crosstalk_1.1.0.1 TH.data_1.0-10
## [9] rstantools_2.1.1 inline_0.3.17 digest_0.6.27 htmltools_0.5.1.1
## [13] rsconnect_0.8.16 fansi_0.4.2 checkmate_2.0.0 magrittr_2.0.1
## [17] modelr_0.1.8 RcppParallel_5.0.2 matrixStats_0.57.0 xts_0.12.1
## [21] sandwich_3.0-0 prettyunits_1.1.1 colorspace_2.0-0 rvest_0.3.6
## [25] haven_2.3.1 xfun_0.22 callr_3.5.1 crayon_1.4.1
## [29] jsonlite_1.7.2 lme4_1.1-25 survival_3.2-10 zoo_1.8-8
## [33] glue_1.4.2 gtable_0.3.0 emmeans_1.5.2-1 V8_3.4.0
## [37] pkgbuild_1.2.0 rstan_2.21.2 DEoptimR_1.0-8 abind_1.4-5
## [41] scales_1.1.1 mvtnorm_1.1-1 DBI_1.1.0 miniUI_0.1.1.1
## [45] xtable_1.8-4 stats4_4.0.4 StanHeaders_2.21.0-7 DT_0.16
## [49] htmlwidgets_1.5.2 httr_1.4.2 threejs_0.3.3 RColorBrewer_1.1-2
## [53] ellipsis_0.3.1 pkgconfig_2.0.3 farver_2.0.3 dbplyr_2.0.0
## [57] utf8_1.1.4 tidyselect_1.1.0 labeling_0.4.2 rlang_0.4.10
## [61] reshape2_1.4.4 later_1.1.0.1 munsell_0.5.0 cellranger_1.1.0
## [65] tools_4.0.4 cli_2.3.1 generics_0.1.0 ggridges_0.5.2
## [69] evaluate_0.14 fastmap_1.0.1 processx_3.4.5 knitr_1.31
## [73] fs_1.5.0 robustbase_0.93-6 nlme_3.1-152 mime_0.10
## [77] projpred_2.0.2 xml2_1.3.2 compiler_4.0.4 bayesplot_1.8.0
## [81] shinythemes_1.1.2 rstudioapi_0.13 gamm4_0.2-6 curl_4.3
## [85] reprex_0.3.0 statmod_1.4.35 stringi_1.5.3 highr_0.8
## [89] ps_1.6.0 Brobdingnag_1.2-6 lattice_0.20-41 Matrix_1.3-2
## [93] nloptr_1.2.2.2 markdown_1.1 shinyjs_2.0.0 vctrs_0.3.6
## [97] pillar_1.5.1 lifecycle_1.0.0 bridgesampling_1.0-0 estimability_1.3
## [101] httpuv_1.5.4 R6_2.5.0 bookdown_0.21 promises_1.1.1
## [105] gridExtra_2.3 codetools_0.2-18 boot_1.3-26 colourpicker_1.1.0
## [109] MASS_7.3-53 gtools_3.8.2 assertthat_0.2.1 withr_2.4.1
## [113] shinystan_2.5.0 multcomp_1.4-16 mgcv_1.8-33 parallel_4.0.4
## [117] hms_0.5.3 grid_4.0.4 coda_0.19-4 minqa_1.2.4
## [121] rmarkdown_2.7 numDeriv_2016.8-1.1 shiny_1.5.0 lubridate_1.7.9.2
## [125] base64enc_0.1-3 dygraphs_1.1.1.6
```