12 Multilevel Models

Multilevel models… remember features of each cluster in the data as they learn about all of the clusters. Depending upon the variation among clusters, which is learned from the data as well, the model pools information across clusters. This pooling tends to improve estimates about each cluster. This improved estimation leads to several, more pragmatic sounding, benefits of the multilevel approach. (McElreath, 2015, p. 356)

These benefits include:

  • improved estimates for repeated sampling (i.e., in longitudinal data)
  • improved estimates when there are imbalances among subsamples
  • estimates of the variation across subsamples
  • avoiding simplistic averaging by retaining variation across subsamples

All of these benefits flow out of the same strategy and model structure. You learn one basic design and you get all of this for free.

When it comes to regression, multilevel regression deserves to be the default approach. There are certainly contexts in which it would be better to use an old-fashioned single-level model. But the contexts in which multilevel models are superior are much more numerous. It is better to begin to build a multilevel analysis, and then realize it’s unnecessary, than to overlook it. And once you grasp the basic multilevel stragety, it becomes much easier to incorporate related tricks such as allowing for measurement error in the data and even model missing data itself (Chapter 14). (p. 356)

I’m totally on board with this. After learning about the multilevel model, I see it everywhere. For more on the sentiment it should be the default, check out McElreath’s blog post, Multilevel regression as default.

12.1 Example: Multilevel tadpoles

Let’s load the reedfrogs data (see Vonesh & Bolker, 2005).

library(rethinking)
data(reedfrogs)
d <- reedfrogs

Detach rethinking and load brms.

rm(reedfrogs)
detach(package:rethinking, unload = T)
library(brms)

Go ahead and acquaint yourself with the reedfrogs.

library(tidyverse)

d %>%
  glimpse()
## Rows: 48
## Columns: 5
## $ density  <int> 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 10, 25, 25, 25, 25, …
## $ pred     <fct> no, no, no, no, no, no, no, no, pred, pred, pred, pred, pred, pred, pred, pred, …
## $ size     <fct> big, big, big, big, small, small, small, small, big, big, big, big, small, small…
## $ surv     <int> 9, 10, 7, 10, 9, 9, 10, 9, 4, 9, 7, 6, 7, 5, 9, 9, 24, 23, 22, 25, 23, 23, 23, 2…
## $ propsurv <dbl> 0.9000000, 1.0000000, 0.7000000, 1.0000000, 0.9000000, 0.9000000, 1.0000000, 0.9…

Making the tank cluster variable is easy.

d <- 
  d %>%
  mutate(tank = 1:nrow(d))

Here’s the formula for the un-pooled model in which each tank gets its own intercept:

\[\begin{align*} \text{surv}_i & \sim \operatorname{Binomial}(n_i, p_i) \\ \operatorname{logit}(p_i) & = \alpha_{\text{tank}_i} \\ \alpha_{\text{tank}} & \sim \operatorname{Normal}(0, 5), \end{align*}\]

where \(n_i\) is indexed by the density column. It’s values are distributed like so:

d %>% 
  count(density)
##   density  n
## 1      10 16
## 2      25 16
## 3      35 16

Now fit this simple aggregated binomial model much like we practiced in Chapter 10.

b12.1 <- 
  brm(data = d, 
      family = binomial,
      surv | trials(density) ~ 0 + factor(tank),
      prior(normal(0, 5), class = b),
      iter = 2000, warmup = 500, chains = 4, cores = 4,
      seed = 12,
      file = "fits/b12.01")

We don’t need a depth=2 argument to discover we have 48 different intercepts. Just good old print() will do.

print(b12.1)
##  Family: binomial 
##   Links: mu = logit 
## Formula: surv | trials(density) ~ 0 + factor(tank) 
##    Data: d (Number of observations: 48) 
## Samples: 4 chains, each with iter = 2000; warmup = 500; thin = 1;
##          total post-warmup samples = 6000
## 
## Population-Level Effects: 
##              Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## factortank1      2.51      1.15     0.66     5.12 1.00    10150     3855
## factortank2      5.71      2.77     1.72    12.36 1.00     9137     4133
## factortank3      0.93      0.71    -0.37     2.36 1.00    14814     3763
## factortank4      5.69      2.71     1.75    11.97 1.00     7958     4121
## factortank5      2.53      1.19     0.62     5.25 1.00     9659     3651
## factortank6      2.54      1.22     0.61     5.35 1.00     9470     3273
## factortank7      5.73      2.77     1.72    12.30 1.00     7031     3812
## factortank8      2.50      1.15     0.65     5.15 1.00    10034     4001
## factortank9     -0.43      0.66    -1.75     0.85 1.00    12724     4055
## factortank10     2.52      1.17     0.63     5.21 1.00     9822     3832
## factortank11     0.94      0.71    -0.34     2.43 1.00    13051     4235
## factortank12     0.44      0.68    -0.85     1.83 1.00    14674     3944
## factortank13     0.93      0.73    -0.42     2.47 1.00    13469     4168
## factortank14    -0.00      0.65    -1.29     1.28 1.00    12924     4110
## factortank15     2.51      1.19     0.59     5.29 1.00    11602     4060
## factortank16     2.53      1.18     0.67     5.27 1.00     8948     3669
## factortank17     3.49      1.15     1.72     6.33 1.00     9215     3462
## factortank18     2.61      0.79     1.27     4.37 1.00    11287     3279
## factortank19     2.11      0.64     1.00     3.51 1.00    11620     3680
## factortank20     6.44      2.64     2.68    12.72 1.00     6870     3836
## factortank21     2.61      0.78     1.28     4.33 1.00    13123     3943
## factortank22     2.61      0.76     1.33     4.33 1.00    12035     3918
## factortank23     2.62      0.80     1.29     4.41 1.00    12911     3948
## factortank24     1.74      0.57     0.73     2.94 1.00    13647     3853
## factortank25    -1.20      0.48    -2.20    -0.32 1.00    14569     4183
## factortank26     0.08      0.42    -0.74     0.93 1.00    14635     4038
## factortank27    -1.74      0.57    -2.97    -0.74 1.00    12882     4120
## factortank28    -0.60      0.42    -1.45     0.22 1.00    13423     4481
## factortank29     0.08      0.41    -0.72     0.90 1.00    13110     3865
## factortank30     1.45      0.51     0.52     2.53 1.00    11813     3975
## factortank31    -0.78      0.43    -1.65     0.02 1.00    12764     3790
## factortank32    -0.42      0.40    -1.23     0.39 1.00    11634     4549
## factortank33     3.81      1.09     2.13     6.35 1.00     8831     3918
## factortank34     2.97      0.77     1.69     4.68 1.00    11227     4148
## factortank35     2.98      0.80     1.65     4.76 1.00    11561     3514
## factortank36     2.13      0.54     1.19     3.29 1.00    12922     4662
## factortank37     2.13      0.55     1.18     3.28 1.00    13053     4465
## factortank38     6.66      2.61     2.99    12.67 1.00     7449     4355
## factortank39     2.97      0.77     1.68     4.75 1.00    11221     3592
## factortank40     2.48      0.64     1.38     3.87 1.00    10042     3928
## factortank41    -2.13      0.56    -3.31    -1.13 1.00    13565     4214
## factortank42    -0.67      0.37    -1.41     0.04 1.00    13587     3968
## factortank43    -0.55      0.36    -1.28     0.14 1.00    12460     4379
## factortank44    -0.42      0.35    -1.11     0.29 1.00    14907     3938
## factortank45     0.54      0.35    -0.15     1.24 1.00    14973     4588
## factortank46    -0.67      0.36    -1.38     0.03 1.00    13068     4119
## factortank47     2.13      0.56     1.14     3.33 1.00    13153     4527
## factortank48    -0.05      0.33    -0.71     0.60 1.00    13864     4462
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Do you remember the dummy variables models from back in Chapter 5? This model is like one of those on steroids. It’ll be instructive to take a look at their distributions in density plots. We’ll plot them in both their log-odds and probability metrics.

For kicks and giggles, let’s use a FiveThirtyEight-like theme for this chapter’s plots. An easy way to do so is with help from the ggthemes package.

library(ggthemes) 

tibble(estimate = fixef(b12.1)[, 1]) %>% 
  mutate(p = inv_logit_scaled(estimate)) %>% 
  gather() %>% 
  mutate(key = if_else(key == "p", "expected survival probability", "expected survival log-odds")) %>% 
  
  ggplot(aes(x = value, fill = key)) +
  geom_density(size = 0) +
  scale_fill_manual(values = c("orange1", "orange4")) +
  scale_y_continuous(breaks = NULL) +
  labs(title = "Tank-level intercepts from the no-pooling model",
       subtitle = "Notice now inspecting the distributions of the posterior means can offer insights you\nmight not get if you looked at them one at a time.") +
  theme_fivethirtyeight() +
  theme(legend.position = "none",
        panel.grid.major = element_blank()) +
  facet_wrap(~key, scales = "free")

Even though it seems like we can derive important insights from how the tank-level intercepts are distributed, that information is not explicitly encoded in the statistical model. Keep that in mind as we now consider the multilevel alternative. Its formula is

\[\begin{align*} \text{surv}_i & \sim \operatorname{Binomial}(n_i, p_i) \\ \operatorname{logit}(p_i) & = \alpha_{\text{tank}_i} \\ \alpha_\text{tank} & \sim \operatorname{Normal}(\color{blue}\alpha, \color{blue}\sigma) \\ \color{blue}\alpha & \color{blue}\sim \color{blue}{\operatorname{Normal}(0, 1)} \\ \color{blue}\sigma & \color{blue}\sim \color{blue}{\operatorname{HalfCauchy}(0, 1)}. \end{align*}\]

The Gaussian distribution with mean \(\alpha\) and standard deviation \(\sigma\) is the prior for each tank’s intercept. But that prior itself has priors for \(\alpha\) and \(\sigma\). So there are two levels in the model, each resembling a simpler model. (p. 359, emphasis in the original)

You specify the corresponding multilevel model like this.

b12.2 <- 
  brm(data = d, 
      family = binomial,
      surv | trials(density) ~ 1 + (1 | tank),
      prior = c(prior(normal(0, 1), class = Intercept),
                prior(cauchy(0, 1), class = sd)),
      iter = 4000, warmup = 1000, chains = 4, cores = 4,
      seed = 12,
      file = "fits/b12.02")

The syntax for the varying effects follows the lme4 style, ( <varying parameter(s)> | <grouping variable(s)> ). In this case (1 | tank) indicates only the intercept, 1, varies by tank. The extent to which parameters vary is controlled by the prior, prior(cauchy(0, 1), class = sd), which is parameterized in the standard deviation metric. Do note that last part. It’s common in multilevel software to model in the variance metric, instead. For technical reasons we won’t really get into until Chapter 13, Stan parameterizes this as a standard deviation.

Let’s perform the WAIC comparisons.

b12.1 <- add_criterion(b12.1, "waic")
b12.2 <- add_criterion(b12.2, "waic")

w <- loo_compare(b12.1, b12.2, criterion = "waic")

print(w, simplify = F)
##       elpd_diff se_diff elpd_waic se_elpd_waic p_waic se_p_waic waic   se_waic
## b12.2    0.0       0.0  -100.0       3.6         20.9    0.8     200.0    7.2 
## b12.1   -0.2       2.2  -100.2       4.7         22.1    0.6     200.4    9.3

The se_diff is large relative to the elpd_diff. If we convert the \(\text{elpd}\) difference to the WAIC metric, the message stays the same.

cbind(waic_diff = w[, 1] * -2,
      se        = w[, 2] *  2)
##       waic_diff       se
## b12.2 0.0000000 0.000000
## b12.1 0.4182852 4.399343

Here are the WAIC weights.

model_weights(b12.1, b12.2, weights = "waic") %>% 
  round(digits = 2)
## b12.1 b12.2 
##  0.45  0.55

Let’s check he model summary.

print(b12.2)
##  Family: binomial 
##   Links: mu = logit 
## Formula: surv | trials(density) ~ 1 + (1 | tank) 
##    Data: d (Number of observations: 48) 
## Samples: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
##          total post-warmup samples = 12000
## 
## Group-Level Effects: 
## ~tank (Number of levels: 48) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.63      0.22     1.25     2.11 1.00     2792     5654
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     1.30      0.25     0.81     1.81 1.00     2018     4168
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

This time we don’t get a list of 48 separate tank-level parameters. However, we do get a description of their distribution interns of a mean (i.e., Intercept) and standard deviation (i.e., sd(Intercept)). If you’d like the actual tank-level parameters, don’t worry; they’re coming in Figure 12.1. We’ll need to do a little prep work, though.

post <- posterior_samples(b12.2, add_chain = T)

post_mdn <- 
  coef(b12.2, robust = T)$tank[, , ] %>% 
  as_tibble() %>% 
  bind_cols(d) %>%
  mutate(post_mdn = inv_logit_scaled(Estimate))

post_mdn
## # A tibble: 48 x 11
##    Estimate Est.Error   Q2.5 Q97.5 density pred  size   surv propsurv  tank post_mdn
##       <dbl>     <dbl>  <dbl> <dbl>   <int> <fct> <fct> <int>    <dbl> <int>    <dbl>
##  1    2.06      0.834  0.598 3.96       10 no    big       9      0.9     1    0.886
##  2    2.96      1.06   1.18  5.48       10 no    big      10      1       2    0.951
##  3    0.974     0.662 -0.257 2.40       10 no    big       7      0.7     3    0.726
##  4    2.95      1.08   1.13  5.61       10 no    big      10      1       4    0.950
##  5    2.06      0.855  0.613 4.09       10 no    small     9      0.9     5    0.887
##  6    2.06      0.845  0.586 3.99       10 no    small     9      0.9     6    0.886
##  7    2.96      1.06   1.21  5.50       10 no    small    10      1       7    0.951
##  8    2.06      0.849  0.602 4.04       10 no    small     9      0.9     8    0.887
##  9   -0.170     0.603 -1.39  0.994      10 pred  big       4      0.4     9    0.458
## 10    2.07      0.848  0.597 4.01       10 pred  big       9      0.9    10    0.888
## # … with 38 more rows

Here’s the ggplot2 code to reproduce Figure 12.1.

post_mdn %>%
  ggplot(aes(x = tank)) +
  geom_hline(yintercept = inv_logit_scaled(median(post$b_Intercept)), linetype = 2, size = 1/4, 
             color = ggthemes_data[["fivethirtyeight"]][1, 2] %>% pull()) +
  geom_vline(xintercept = c(16.5, 32.5), size = 1, 
             color = ggthemes_data[["fivethirtyeight"]][2, 2] %>% pull()) +
  geom_point(aes(y = propsurv), color = "orange2") +
  geom_point(aes(y = post_mdn), shape = 1) +
  annotate(geom = "text", x = c(8, 16 + 8, 32 + 8), y = 0, 
           label = c("small tanks", "medium tanks", "large tanks")) +
  scale_x_continuous(breaks = c(1, 16, 32, 48)) +
  labs(title = "Multilevel shrinkage!",
       subtitle = "The empirical proportions are in orange while the model-\nimplied proportions are the black circles. The dashed line is\nthe model-implied average survival proportion.") +
  coord_cartesian(ylim = c(0, 1)) +
  theme_fivethirtyeight() +
  theme(panel.grid.major = element_blank())

Here is the code for our version of Figure 12.2.a, where we visualize the model-implied population distribution of log-odds survival (i.e., the population distribution yielding all the tank-level intercepts).

# this makes the output of `sample_n()` reproducible
set.seed(12)

p1 <-
  post %>% 
  sample_n(100) %>% 
  expand(nesting(iter, b_Intercept, sd_tank__Intercept),
         x = seq(from = -4, to = 5, length.out = 100)) %>% 

  ggplot(aes(x = x, group = iter)) +
  geom_line(aes(y = dnorm(x, b_Intercept, sd_tank__Intercept)),
            alpha = .2, color = "orange2") +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(title = "Population survival distribution",
       subtitle = "log-odds scale") +
  coord_cartesian(xlim = c(-3, 4))

Now we make our Figure 12.2.b and then bind the two subplots with patchwork.

p2 <-
  ggplot(data = post, 
         aes(x = rnorm(n    = nrow(post), 
                       mean = b_Intercept, 
                       sd   = sd_tank__Intercept) %>% 
               inv_logit_scaled())) +
  geom_density(size = 0, fill = "orange2") +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(title = "Probability of survival",
       subtitle = "transformed by the inverse-logit function")

library(patchwork)

(p1 + p2) &
  theme_fivethirtyeight() &
  theme(plot.title = element_text(size = 12),
        plot.subtitle = element_text(size = 10))

In the left plot, notice the uncertainty in terms of both location \(\alpha\) and scale \(\sigma\). For both pots, note how we sampled 12,000 imaginary tanks rather than McElreath’s 8,000. This is because we had 12,000 HMC iterations (i.e., execute nrow(post)).

The aes() code in that last block was a bit much. To get a sense of how it worked, consider this.

set.seed(12)

rnorm(n    = 1, 
      mean = post$b_Intercept, 
      sd   = post$sd_tank__Intercept) %>% 
  inv_logit_scaled()
## [1] 0.353363

First, we took one random draw from a normal distribution with a mean of the first row in post$b_Intercept and a standard deviation of the value from the first row in post$sd_tank__Intercept, and passed it through the inv_logit_scaled() function. By replacing the 1 with nrow(post), we do this nrow(post) times (i.e., 12,000). Our orange density, then, is the summary of that process.

12.1.0.1 Rethinking: Varying intercepts as over-dispersion.

In the previous chapter (page 346), the beta-binomial and gamma-Poisson models were presented as ways for coping with over-dispersion of count data. Varying intercepts accomplish the same thing, allowing count outcomes to be over-dispersed. They accomplish this, because when each observed count gets its own unique intercept, but these intercepts are pooled through a common distribution, the predictions expect over-dispersion just like a beta-binomial or gamma-Poisson model would. Compared to a beta-binomial or gamma-Poisson model, a binomial or Poisson model with a varying intercept on every observed outcome will often be easier to estimate and easier to extend. (p. 363, emphasis in the original)

12.1.0.2 Overthinking: Prior for variance components.

Yep, you can use the exponential distribution for your priors in brms, too. Here it is for model b12.2.

b12.2b <- 
  update(b12.2,
         prior = c(prior(normal(0, 1), class = Intercept),
                   prior(exponential(1), class = sd)),
         seed = 12,
         file = "fits/b12.02b")

The model summary:

print(b12.2b)
##  Family: binomial 
##   Links: mu = logit 
## Formula: surv | trials(density) ~ 1 + (1 | tank) 
##    Data: d (Number of observations: 48) 
## Samples: 4 chains, each with iter = 4000; warmup = 1000; thin = 1;
##          total post-warmup samples = 12000
## 
## Group-Level Effects: 
## ~tank (Number of levels: 48) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.62      0.21     1.25     2.09 1.00     2652     5353
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     1.30      0.25     0.80     1.80 1.00     2153     3760
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

If you’re curious how the exponential prior compares to the posterior, you might just plot.

tibble(x = seq(from = 0, to = 6, by = .01)) %>% 
  
  ggplot() +
  # the prior
  geom_ribbon(aes(x = x, ymin = 0, ymax = dexp(x, rate = 1)),
              fill = "orange2", alpha = 1/3) +
  # the posterior
  geom_density(data = posterior_samples(b12.2b),
               aes(x = sd_tank__Intercept), 
               fill = "orange2", size = 0) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(title = "Bonus prior/posterior plot\nfor sd_tank__Intercept",
       subtitle = "The prior is the semitransparent ramp in the\nbackground. The posterior is the solid orange\nmound.") +
  coord_cartesian(xlim = c(0, 5)) +
  theme_fivethirtyeight()

12.2 Varying effects and the underfitting/overfitting trade-off

Varying intercepts are just regularized estimates, but adaptively regularized by estimating how diverse the clusters are while estimating the features of each cluster. This fact is not easy to grasp….

A major benefit of using varying effects estimates, instead of the empirical raw estimates, is that they provide more accurate estimates of the individual cluster (tank) intercepts. On average, the varying effects actually provide a better estimate of the individual tank (cluster) means. The reason that the varying intercepts provides better estimates is that they do a better job trading off underfitting and overfitting. (p. 364)

In this section, we explicate this by contrasting three perspectives:

  • Complete pooling (i.e., a single-\(\alpha\) model)
  • No pooling (i.e., the single-level \(\alpha_{\text{tank}_i}\) model)
  • Partial pooling (i.e., the multilevel model for which \(\alpha_\text{tank} \sim \operatorname{Normal}(\alpha, \sigma)\))

To demonstrate [the magic of the multilevel model], we’ll simulate some tadpole data. That way, we’ll know the true per-pond survival probabilities. Then we can compare the no-pooling estimates to the partial pooling estimates, by computing how close each gets to the true values they are trying to estimate. The rest of this section shows how to do such a simulation. (p. 365)

12.2.1 The model.

The simulation formula should look familiar.

\[\begin{align*} \text{surv}_i & \sim \operatorname{Binomial}(n_i, p_i) \\ \operatorname{logit}(p_i) & = \alpha_{\text{pond}_i} \\ \alpha_{\text{pond}} & \sim \operatorname{Normal}(\alpha, \sigma) \\ \alpha & \sim \operatorname{Normal}(0, 1) \\ \sigma & \sim \operatorname{HalfCauchy}(0, 1) \end{align*}\]

12.2.2 Assign values to the parameters.

Here we follow along with McElreath and “assign specific values representative of the actual tadpole data” (p. 366). However, our values will differ a little from his. Because he did not show a set.seed() line, we don’t know what seed was used to generate pseudo random draws from the rnorm() function.

a       <-  1.4
sigma   <-  1.5
n_ponds <- 60

set.seed(12)

dsim <- 
  tibble(pond   = 1:n_ponds,
         ni     = rep(c(5, 10, 25, 35), each = n_ponds / 4) %>% as.integer(),
         true_a = rnorm(n = n_ponds, mean = a, sd = sigma))

head(dsim)
## # A tibble: 6 x 3
##    pond    ni  true_a
##   <int> <int>   <dbl>
## 1     1     5 -0.821 
## 2     2     5  3.77  
## 3     3     5 -0.0351
## 4     4     5  0.0200
## 5     5     5 -1.60  
## 6     6     5  0.992

McElreath twice urged us to inspect the contents of the simulation. In addition to looking at the data with head(), we might well plot.

library(tidybayes)

dsim %>% 
  mutate(ni = factor(ni)) %>% 
  
  ggplot(aes(x = true_a, y = ni)) +
  stat_halfeye(.width = .5, fill = "orange2") +
  ggtitle("Log-odds varying by # tadpoles per pond") +
  theme_fivethirtyeight() +
  theme(plot.title = element_text(size = 14))

12.2.3 Sumulate survivors.

Each pond \(i\) has \(n_i\) potential survivors, and nature flips each tadpole’s coin, so to speak, with probability of survival \(p_i\). This probability \(p_i\) is implied by the model definition, and is equal to:

\[p_i = \frac{\exp (\alpha_i)}{1 + \exp (\alpha_i)}\]

The model uses a logit link, and so the probability is defined by the [inv_logit_scaled()] function. (p. 367)

set.seed(12)
(
  dsim <-
  dsim %>%
  mutate(si = rbinom(n = n(), prob = inv_logit_scaled(true_a), size = ni))
)
## # A tibble: 60 x 4
##     pond    ni  true_a    si
##    <int> <int>   <dbl> <int>
##  1     1     5 -0.821      0
##  2     2     5  3.77       5
##  3     3     5 -0.0351     4
##  4     4     5  0.0200     3
##  5     5     5 -1.60       0
##  6     6     5  0.992      5
##  7     7     5  0.927      5
##  8     8     5  0.458      3
##  9     9     5  1.24       5
## 10    10     5  2.04       5
## # … with 50 more rows

12.2.4 Compute the no-pooling estimates.

The no-pooling estimates (i.e., \(\alpha_{\text{tank}_i}\)) are the results of simple algebra.

(
  dsim <-
  dsim %>%
  mutate(p_nopool = si / ni)
)
## # A tibble: 60 x 5
##     pond    ni  true_a    si p_nopool
##    <int> <int>   <dbl> <int>    <dbl>
##  1     1     5 -0.821      0      0  
##  2     2     5  3.77       5      1  
##  3     3     5 -0.0351     4      0.8
##  4     4     5  0.0200     3      0.6
##  5     5     5 -1.60       0      0  
##  6     6     5  0.992      5      1  
##  7     7     5  0.927      5      1  
##  8     8     5  0.458      3      0.6
##  9     9     5  1.24       5      1  
## 10    10     5  2.04       5      1  
## # … with 50 more rows

“These are the same no-pooling estimates you’d get by fitting a model with a dummy variable for each pond and flat priors that induce no regularization” (p. 367). That is, these are the same kinds of estimates we got back when we fit b12.1.

12.2.5 Compute the partial-pooling estimates.

To follow along with McElreath, set chains = 1, cores = 1 to fit with one chain.

b12.3 <- 
  brm(data = dsim, 
      family = binomial,
      si | trials(ni) ~ 1 + (1 | pond),
      prior = c(prior(normal(0, 1), class = Intercept),
                prior(cauchy(0, 1), class = sd)),
      iter = 10000, warmup = 1000, chains = 1, cores = 1,
      seed = 12,
      file = "fits/b12.03")

Here’s our standard brms summary.

print(b12.3)
##  Family: binomial 
##   Links: mu = logit 
## Formula: si | trials(ni) ~ 1 + (1 | pond) 
##    Data: dsim (Number of observations: 60) 
## Samples: 1 chains, each with iter = 10000; warmup = 1000; thin = 1;
##          total post-warmup samples = 9000
## 
## Group-Level Effects: 
## ~pond (Number of levels: 60) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.30      0.19     0.98     1.71 1.00     3070     4565
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     1.28      0.20     0.90     1.67 1.00     2314     3819
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

I’m not aware that you can use McElreath’s depth=2 trick in brms for summary() or print(). But can get most of that information and more with the Stan-like summary using the $fit syntax.

b12.3$fit
## Inference for Stan model: f8a39cc6e808c27e785886e2fdb3b44f.
## 1 chains, each with iter=10000; warmup=1000; thin=1; 
## post-warmup draws per chain=9000, total post-warmup draws=9000.
## 
##                         mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
## b_Intercept             1.28    0.00 0.20    0.90    1.15    1.28    1.41    1.67  2309    1
## sd_pond__Intercept      1.30    0.00 0.19    0.98    1.17    1.29    1.42    1.71  3043    1
## r_pond[1,Intercept]    -2.34    0.01 0.87   -4.20   -2.88   -2.30   -1.75   -0.77  9329    1
## r_pond[2,Intercept]     1.02    0.01 0.99   -0.76    0.33    0.97    1.64    3.16 12083    1
## r_pond[3,Intercept]     0.17    0.01 0.87   -1.45   -0.42    0.13    0.72    2.04 13266    1
## r_pond[4,Intercept]    -0.52    0.01 0.83   -2.09   -1.07   -0.53    0.02    1.13 14254    1
## r_pond[5,Intercept]    -2.34    0.01 0.89   -4.24   -2.88   -2.29   -1.74   -0.72 12076    1
## r_pond[6,Intercept]     1.03    0.01 1.01   -0.74    0.34    0.96    1.65    3.22 11240    1
## r_pond[7,Intercept]     1.02    0.01 1.02   -0.76    0.32    0.94    1.66    3.21 12049    1
## r_pond[8,Intercept]    -0.51    0.01 0.80   -2.06   -1.04   -0.53    0.01    1.12 12366    1
## r_pond[9,Intercept]     1.03    0.01 1.01   -0.78    0.34    0.96    1.67    3.19 13201    1
## r_pond[10,Intercept]    1.04    0.01 1.01   -0.74    0.33    0.96    1.68    3.20 12346    1
## r_pond[11,Intercept]   -0.51    0.01 0.81   -2.06   -1.06   -0.52    0.02    1.13 13109    1
## r_pond[12,Intercept]   -0.51    0.01 0.79   -2.01   -1.05   -0.52    0.01    1.12 11039    1
## r_pond[13,Intercept]   -0.51    0.01 0.80   -2.04   -1.05   -0.51    0.02    1.06 13173    1
## r_pond[14,Intercept]    0.17    0.01 0.87   -1.45   -0.42    0.14    0.73    1.98 17063    1
## r_pond[15,Intercept]    0.18    0.01 0.86   -1.42   -0.42    0.14    0.74    1.98 13870    1
## r_pond[16,Intercept]   -0.65    0.01 0.64   -1.90   -1.08   -0.66   -0.24    0.63 10695    1
## r_pond[17,Intercept]    1.43    0.01 0.94   -0.19    0.77    1.36    2.04    3.45 11419    1
## r_pond[18,Intercept]    0.71    0.01 0.80   -0.72    0.16    0.67    1.21    2.41 12569    1
## r_pond[19,Intercept]    0.71    0.01 0.81   -0.75    0.15    0.66    1.21    2.43 11048    1
## r_pond[20,Intercept]    0.72    0.01 0.79   -0.73    0.18    0.67    1.21    2.40 10995    1
## r_pond[21,Intercept]    0.72    0.01 0.81   -0.73    0.18    0.67    1.23    2.43 11548    1
## r_pond[22,Intercept]    1.42    0.01 0.92   -0.19    0.78    1.36    2.00    3.38 10515    1
## r_pond[23,Intercept]    1.42    0.01 0.93   -0.21    0.77    1.34    2.01    3.43 11380    1
## r_pond[24,Intercept]   -0.64    0.01 0.64   -1.89   -1.07   -0.65   -0.24    0.68 11247    1
## r_pond[25,Intercept]   -1.70    0.01 0.63   -2.96   -2.11   -1.68   -1.28   -0.53 12571    1
## r_pond[26,Intercept]    0.18    0.01 0.70   -1.13   -0.29    0.15    0.63    1.63 11964    1
## r_pond[27,Intercept]    0.17    0.01 0.70   -1.14   -0.32    0.14    0.62    1.62 11733    1
## r_pond[28,Intercept]    0.71    0.01 0.78   -0.73    0.17    0.67    1.21    2.38 12259    1
## r_pond[29,Intercept]    0.71    0.01 0.81   -0.73    0.15    0.67    1.22    2.40 13098    1
## r_pond[30,Intercept]    0.17    0.01 0.70   -1.11   -0.32    0.15    0.62    1.65 13107    1
## r_pond[31,Intercept]    1.42    0.01 0.71    0.15    0.93    1.37    1.87    2.93 12759    1
## r_pond[32,Intercept]    1.42    0.01 0.71    0.15    0.92    1.38    1.86    2.98 10361    1
## r_pond[33,Intercept]   -1.08    0.01 0.43   -1.94   -1.37   -1.08   -0.79   -0.25  7366    1
## r_pond[34,Intercept]   -1.08    0.00 0.43   -1.94   -1.37   -1.08   -0.79   -0.25  7679    1
## r_pond[35,Intercept]   -0.45    0.01 0.45   -1.32   -0.75   -0.45   -0.15    0.44  7530    1
## r_pond[36,Intercept]   -1.08    0.00 0.43   -1.93   -1.37   -1.09   -0.79   -0.24  7465    1
## r_pond[37,Intercept]    0.40    0.01 0.53   -0.60    0.02    0.37    0.75    1.48  8609    1
## r_pond[38,Intercept]   -0.94    0.01 0.44   -1.81   -1.23   -0.93   -0.64   -0.07  7487    1
## r_pond[39,Intercept]    1.42    0.01 0.70    0.17    0.92    1.38    1.87    2.92 12071    1
## r_pond[40,Intercept]   -1.38    0.01 0.42   -2.22   -1.66   -1.38   -1.09   -0.57  7076    1
## r_pond[41,Intercept]    0.15    0.01 0.51   -0.81   -0.20    0.13    0.47    1.19  8493    1
## r_pond[42,Intercept]   -1.39    0.01 0.43   -2.25   -1.68   -1.38   -1.08   -0.57  7359    1
## r_pond[43,Intercept]   -0.07    0.01 0.49   -1.01   -0.41   -0.08    0.25    0.91  7532    1
## r_pond[44,Intercept]   -2.38    0.01 0.48   -3.37   -2.69   -2.36   -2.04   -1.47  8945    1
## r_pond[45,Intercept]   -0.61    0.00 0.45   -1.49   -0.92   -0.62   -0.30    0.28  8347    1
## r_pond[46,Intercept]    1.30    0.01 0.61    0.19    0.87    1.25    1.70    2.61 10122    1
## r_pond[47,Intercept]   -1.36    0.00 0.39   -2.12   -1.61   -1.35   -1.10   -0.59  6712    1
## r_pond[48,Intercept]    1.30    0.01 0.61    0.20    0.87    1.27    1.68    2.60  8673    1
## r_pond[49,Intercept]    1.70    0.01 0.69    0.48    1.22    1.65    2.13    3.21 10573    1
## r_pond[50,Intercept]   -0.80    0.00 0.38   -1.56   -1.06   -0.80   -0.54   -0.05  6647    1
## r_pond[51,Intercept]    0.14    0.01 0.45   -0.71   -0.17    0.13    0.44    1.06  7550    1
## r_pond[52,Intercept]    0.14    0.01 0.44   -0.72   -0.17    0.13    0.43    1.03  7163    1
## r_pond[53,Intercept]    0.98    0.01 0.55   -0.01    0.61    0.96    1.33    2.14  9067    1
## r_pond[54,Intercept]    2.21    0.01 0.81    0.83    1.64    2.14    2.70    4.01  8750    1
## r_pond[55,Intercept]   -1.36    0.00 0.38   -2.10   -1.61   -1.36   -1.10   -0.60  7062    1
## r_pond[56,Intercept]    0.31    0.01 0.47   -0.58    0.00    0.30    0.62    1.28  7959    1
## r_pond[57,Intercept]    0.73    0.01 0.52   -0.22    0.38    0.70    1.07    1.81  8153    1
## r_pond[58,Intercept]   -1.69    0.00 0.39   -2.46   -1.96   -1.68   -1.43   -0.95  7118    1
## r_pond[59,Intercept]   -0.30    0.01 0.41   -1.08   -0.58   -0.31   -0.03    0.54  6262    1
## r_pond[60,Intercept]    0.14    0.01 0.44   -0.70   -0.16    0.13    0.44    1.04  6953    1
## lp__                 -189.94    0.17 7.66 -205.99 -194.95 -189.73 -184.63 -175.66  2042    1
## 
## Samples were drawn using NUTS(diag_e) at Wed Oct  7 16:03:32 2020.
## For each parameter, n_eff is a crude measure of effective sample size,
## and Rhat is the potential scale reduction factor on split chains (at 
## convergence, Rhat=1).

As an aside, notice how this summary still reports the old-style n_eff values, rather than the updated Bulk_ESS and Tail_ESS values. I suspect this will change sometime soon. In the meantime, here’s a thread on the Stan Forums featuring members of the Stan team discussing how.

Let’s get ready for the diagnostic plot of Figure 12.3. First we add the partially-pooled estimates, as summarized by their posterior means, to the dsim data. Then we compute error values.

# we could have included this step in the block of code below, if we wanted to
p_partpool <- 
  coef(b12.3)$pond[, , ] %>% 
  as_tibble() %>%
  transmute(p_partpool = inv_logit_scaled(Estimate))

dsim <- 
  dsim %>%
  bind_cols(p_partpool) %>% 
  mutate(p_true = inv_logit_scaled(true_a)) %>%
  mutate(nopool_error   = abs(p_nopool   - p_true),
         partpool_error = abs(p_partpool - p_true))

dsim %>% 
  glimpse()
## Rows: 60
## Columns: 9
## $ pond           <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21,…
## $ ni             <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 10, 10, 10, 10, 10, 10, 10, 1…
## $ true_a         <dbl> -0.82085139, 3.76575421, -0.03511672, 0.01999213, -1.59646315, 0.99155593,…
## $ si             <int> 0, 5, 4, 3, 0, 5, 5, 3, 5, 5, 3, 3, 3, 4, 4, 6, 10, 9, 9, 9, 9, 10, 10, 6,…
## $ p_nopool       <dbl> 0.00, 1.00, 0.80, 0.60, 0.00, 1.00, 1.00, 0.60, 1.00, 1.00, 0.60, 0.60, 0.…
## $ p_partpool     <dbl> 0.2565227, 0.9089706, 0.8091216, 0.6817224, 0.2565872, 0.9093443, 0.908726…
## $ p_true         <dbl> 0.3055830, 0.9773737, 0.4912217, 0.5049979, 0.1684765, 0.7293951, 0.716461…
## $ nopool_error   <dbl> 0.305582963, 0.022626343, 0.308778278, 0.095002134, 0.168476520, 0.2706048…
## $ partpool_error <dbl> 0.049060241, 0.068403053, 0.317899899, 0.176724579, 0.088110642, 0.1799491…

Here is our code for Figure 12.3. The extra data processing for dfline is how we get the values necessary for the horizontal summary lines.

dfline <- 
  dsim %>%
  select(ni, nopool_error:partpool_error) %>%
  gather(key, value, -ni) %>%
  group_by(key, ni) %>%
  summarise(mean_error = mean(value)) %>%
  mutate(x    = c( 1, 16, 31, 46),
         xend = c(15, 30, 45, 60))
  
dsim %>% 
  ggplot(aes(x = pond)) +
  geom_vline(xintercept = c(15.5, 30.5, 45.4), 
             color = "white", size = 2/3) +
  geom_point(aes(y = nopool_error), color = "orange2") +
  geom_point(aes(y = partpool_error), shape = 1) +
  geom_segment(data = dfline, 
               aes(x = x, xend = xend, 
                   y = mean_error, yend = mean_error),
               color = rep(c("orange2", "black"), each = 4),
               linetype = rep(1:2, each = 4)) +
  annotate("text", x = c(15 - 7.5, 30 - 7.5, 45 - 7.5, 60 - 7.5), y = .45, 
           label = c("tiny (5)", "small (10)", "medium (25)", "large (35)")) +
  scale_x_continuous(breaks = c(1, 10, 20, 30, 40, 50, 60)) +
  labs(title = "Estimate error by model type",
       subtitle = "The horizontal axis displays pond number. The vertical axis measures\nthe absolute error in the predicted proportion of survivors, compared to\nthe true value used in the simulation. The higher the point, the worse\nthe estimate. No-pooling shown in orange. Partial pooling shown in black.\nThe orange and dashed black lines show the average error for each kind\nof estimate, across each initial density of tadpoles (pond size). Smaller\nponds produce more error, but the partial pooling estimates are better\non average, especially in smaller ponds.",
       y = "absolute error") +
  theme_fivethirtyeight() +
  theme(panel.grid.major = element_blank(),
        plot.subtitle = element_text(size = 10))

If you wanted to quantify the difference in simple summaries, you might do something like this.

dsim %>%
  select(ni, nopool_error:partpool_error) %>%
  gather(key, value, -ni) %>%
  group_by(key) %>%
  summarise(mean_error   = mean(value) %>% round(digits = 3),
            median_error = median(value) %>% round(digits = 3))
## # A tibble: 2 x 3
##   key            mean_error median_error
##   <chr>               <dbl>        <dbl>
## 1 nopool_error        0.078         0.05
## 2 partpool_error      0.067         0.05

I originally learned about the multilevel in order to work with longitudinal data. In that context, I found the basic principles of a multilevel structure quite intuitive. The concept of partial pooling, however, took me some time to wrap my head around. If you’re struggling with this, be patient and keep chipping away.

When McElreath lectured on this topic in 2015, he traced partial pooling to statistician Charles M. Stein. In 1977, Efron and Morris wrote the now classic (1977) paper, Stein’s paradox in statistics, which does a nice job breaking down why partial pooling can be so powerful. One of the primary examples they used in the paper was of 1970 batting-average data. If you’d like more practice seeing how partial pooling works–or if you just like baseball–, check out my blog post, Stein’s paradox and what partial pooling can do for you.

12.2.5.1 Overthinking: Repeating the pond simulation.

Within the brms workflow, we can reuse a compiled model with update(). But first, we’ll simulate new data.

a       <-  1.4
sigma   <-  1.5
n_ponds <- 60

set.seed(1999)  # for new data, set a new seed
new_dsim <- 
  tibble(pond   = 1:n_ponds,
         ni     = rep(c(5, 10, 25, 35), each = n_ponds / 4) %>% as.integer(),
         true_a = rnorm(n = n_ponds, mean = a, sd = sigma)) %>% 
    mutate(si = rbinom(n = n(), prob = inv_logit_scaled(true_a), size = ni)) %>% 
    mutate(p_nopool = si / ni)

glimpse(new_dsim)
## Rows: 60
## Columns: 5
## $ pond     <int> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 2…
## $ ni       <int> 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 5, 10, 10, 10, 10, 10, 10, 10, 10, 10,…
## $ true_a   <dbl> 2.4990087, 1.3432554, 3.2045137, 3.6047030, 1.6005354, 2.1797409, 0.5759270, -0.…
## $ si       <int> 4, 4, 5, 4, 4, 4, 2, 4, 3, 5, 4, 5, 2, 2, 5, 10, 7, 10, 10, 8, 10, 9, 5, 10, 10,…
## $ p_nopool <dbl> 0.80, 0.80, 1.00, 0.80, 0.80, 0.80, 0.40, 0.80, 0.60, 1.00, 0.80, 1.00, 0.40, 0.…

Fit the new model.

b12.3_new <- 
  update(b12.3,
         newdata = new_dsim,
         seed = 12,
         file = "fits/b12.03_new")
print(b12.3_new)
##  Family: binomial 
##   Links: mu = logit 
## Formula: si | trials(ni) ~ 1 + (1 | pond) 
##    Data: new_dsim (Number of observations: 60) 
## Samples: 1 chains, each with iter = 10000; warmup = 1000; thin = 1;
##          total post-warmup samples = 9000
## 
## Group-Level Effects: 
## ~pond (Number of levels: 60) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     1.26      0.18     0.95     1.65 1.00     3155     5291
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     1.52      0.20     1.15     1.93 1.00     3295     4515
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Why not plot the first simulation versus the second one?

bind_rows(posterior_samples(b12.3),
          posterior_samples(b12.3_new)) %>%
  mutate(model = rep(c("b12.3", "b12.3_new"), each = n() / 2)) %>% 

  ggplot(aes(x = b_Intercept, y = sd_pond__Intercept)) +
  stat_density_2d(geom = "raster", 
                  aes(fill = stat(density)), 
                  contour = F, n = 200) +
  geom_vline(xintercept = a,     color = "orange3", linetype = 3) +
  geom_hline(yintercept = sigma, color = "orange3", linetype = 3) +
  scale_fill_gradient(low = "grey25", high = "orange3") +
  ggtitle("Our simulation posteriors contrast a bit",
          subtitle = expression(alpha*" is on the x and "*sigma*" is on the y, both in log-odds. The dotted lines intersect at the true values.")) +
  coord_cartesian(xlim = c(.7, 2),
                  ylim = c(.8, 1.9)) +
  theme_fivethirtyeight() +
  theme(legend.position = "none",
        panel.grid.major = element_blank()) +
  facet_wrap(~model, ncol = 2)

If you’d like the stanfit portion of your brm() object, subset with $fit. Take b12.3, for example. You might check out its structure via b12.3$fit %>% str(). Here’s the actual Stan code.

b12.3$fit@stanmodel
## S4 class stanmodel 'f8a39cc6e808c27e785886e2fdb3b44f' coded as follows:
## // generated with brms 2.13.5
## functions {
## }
## data {
##   int<lower=1> N;  // number of observations
##   int Y[N];  // response variable
##   int trials[N];  // number of trials
##   // data for group-level effects of ID 1
##   int<lower=1> N_1;  // number of grouping levels
##   int<lower=1> M_1;  // number of coefficients per level
##   int<lower=1> J_1[N];  // grouping indicator per observation
##   // group-level predictor values
##   vector[N] Z_1_1;
##   int prior_only;  // should the likelihood be ignored?
## }
## transformed data {
## }
## parameters {
##   real Intercept;  // temporary intercept for centered predictors
##   vector<lower=0>[M_1] sd_1;  // group-level standard deviations
##   vector[N_1] z_1[M_1];  // standardized group-level effects
## }
## transformed parameters {
##   vector[N_1] r_1_1;  // actual group-level effects
##   r_1_1 = (sd_1[1] * (z_1[1]));
## }
## model {
##   // initialize linear predictor term
##   vector[N] mu = Intercept + rep_vector(0, N);
##   for (n in 1:N) {
##     // add more terms to the linear predictor
##     mu[n] += r_1_1[J_1[n]] * Z_1_1[n];
##   }
##   // priors including all constants
##   target += normal_lpdf(Intercept | 0, 1);
##   target += cauchy_lpdf(sd_1 | 0, 1)
##     - 1 * cauchy_lccdf(0 | 0, 1);
##   target += std_normal_lpdf(z_1[1]);
##   // likelihood including all constants
##   if (!prior_only) {
##     target += binomial_logit_lpmf(Y | trials, mu);
##   }
## }
## generated quantities {
##   // actual population-level intercept
##   real b_Intercept = Intercept;
## }
## 

And you can get the data of a given brm() fit object like so.

b12.3$data %>% 
  head()
##   si ni pond
## 1  0  5    1
## 2  5  5    2
## 3  4  5    3
## 4  3  5    4
## 5  0  5    5
## 6  5  5    6

12.3 More than one type of cluster

“We can use and often should use more than one type of cluster in the same model” (p. 370).

12.3.0.1 Rethinking: Cross-classification and hierarchy.

The kind of data structure in data(chimpanzees) is usually called a cross-classified multilevel model. It is cross-classified, because actors are not nested within unique blocks. If each chimpanzee had instead done all of his or her pulls on a single day, within a single block, then the data structure would instead be hierarchical. (p. 371, emphasis in the original)

12.3.1 Multilevel chimpanzees.

The initial multilevel update from model b10.4 from Chapter 10 follows the statistical formula

\[\begin{align*} \text{left_pull}_i & \sim \operatorname{Binomial}(n_i = 1, p_i) \\ \operatorname{logit}(p_i) & = \alpha + \color{blue}{\alpha_{\text{actor}_i}} + (\beta_1 + \beta_2 \text{condition}_i) \text{prosoc_left}_i \\ \color{blue}{\alpha_\text{actor}} & \color{blue}\sim \color{blue}{\operatorname{Normal}(0, \sigma_\text{actor})} \\ \alpha & \sim \operatorname{Normal}(0, 10) \\ \beta_1 & \sim \operatorname{Normal}(0, 10) \\ \beta_2 & \sim \operatorname{Normal}(0, 10) \\ \color{blue}{\sigma_\text{actor}} & \color{blue}\sim \color{blue}{\operatorname{HalfCauchy}(0, 1)}. \end{align*}\]

Notice that \(\alpha\) is inside the linear model, not inside the Gaussian prior for \(\alpha_\text{actor}\). This is mathematically equivalent to what [we] did with the tadpoles earlier in the chapter. You can always take the mean out of a Gaussian distribution and treat that distribution as a constant plus a Gaussian distribution centered on zero.

This might seem a little weird at first, so it might help train your intuition by experimenting in R. (p. 371)

Behold our two identical Gaussians in a tidy tibble.

set.seed(12)
two_gaussians <- 
  tibble(y1 =      rnorm(n = 1e4, mean = 10, sd = 1),
         y2 = 10 + rnorm(n = 1e4, mean = 0,  sd = 1))

Let’s follow McElreath’s advice to make sure they are same by superimposing the density of one on the other.

two_gaussians %>%
  
  ggplot() +
  geom_density(aes(x = y1), 
               size = 0, fill = "orange1", alpha = 1/3) +
  geom_density(aes(x = y2), 
               size = 0, fill = "orange4", alpha = 1/3) +
  scale_y_continuous(NULL, breaks = NULL) +
  ggtitle("Our simulated Gaussians") +
  theme_fivethirtyeight()

Yep, those Gaussians look about the same.

Let’s get the chimpanzees data from rethinking.

library(rethinking)
data(chimpanzees)
d <- chimpanzees

Detach rethinking and reload brms.

rm(chimpanzees)
detach(package:rethinking, unload = T)
library(brms)

For our brms model with varying intercepts for actor but not block, we employ the pulled_left ~ 1 + ... + (1 | actor) syntax, specifically omitting a (1 | block) section.

b12.4 <- 
  brm(data = d, 
      family = binomial,
      pulled_left | trials(1) ~ 1 + prosoc_left + prosoc_left:condition + (1 | actor),
      prior = c(prior(normal(0, 10), class = Intercept),
                prior(normal(0, 10), class = b),
                prior(cauchy(0, 1), class = sd)),
      # I'm using 4 cores, instead of the `cores=3` in McElreath's code
      iter = 5000, warmup = 1000, chains = 4, cores = 4,  
      control = list(adapt_delta = 0.95),
      seed = 12,
      file = "fits/b12.04")

The initial solutions came with a few divergent transitions. Increasing adapt_delta to 0.95 solved the problem.

McElreath encouraged us to inspect the trace plots. Here they are.

library(bayesplot)
color_scheme_set("orange")

post <- posterior_samples(b12.4, add_chain = T)

post %>% 
  select(-lp__, -iter) %>% 
  mcmc_trace(facet_args = list(ncol = 4)) +
  scale_x_continuous(breaks = c(0, 2500, 5000)) +
  theme_fivethirtyeight() +
  theme(legend.direction = "vertical",
        legend.key.size = unit(0.5, "cm"),
        legend.position = c(.96, .1))

They look great. Here’s the posterior distribution for \(\sigma_\text{actor}\).

posterior_samples(b12.4) %>% 
  ggplot(aes(x = sd_actor__Intercept, y = 0)) +
  stat_halfeye(.width = .95, fill = "orange2") +
  scale_y_continuous(NULL, breaks = NULL) +
  ggtitle(expression(sigma[actor])) +
  theme_fivethirtyeight()

We can inspect the \(\widehat R\) and effective sample size values for all parameters with the bayesplot::rhat() function. Here we’ll put it in a data frame for easy viewing.

data.frame(rhat = rhat(b12.4))
##                             rhat
## b_Intercept             1.000274
## b_prosoc_left           1.000068
## b_prosoc_left:condition 1.000108
## sd_actor__Intercept     1.001286
## r_actor[1,Intercept]    1.000232
## r_actor[2,Intercept]    1.000406
## r_actor[3,Intercept]    1.000304
## r_actor[4,Intercept]    1.000263
## r_actor[5,Intercept]    1.000344
## r_actor[6,Intercept]    1.000308
## r_actor[7,Intercept]    1.000159
## lp__                    1.001914

I’m not aware that brms or bayesplot offer a convenient way to compute the Bulk_ESS or Tail_ESS values for arbitrary combinations of parameters in a data frame. However you can do so with help from the posterior package (Bürkner et al., 2020), which has not made its way to CRAN, yet, but can be downloaded directly from GitHub.

# install the beta release with this
install.packages("posterior", repos = c("https://mc-stan.org/r-packages/", getOption("repos")))

# install the latest development version with this instead
install.packages("remotes")
remotes::install_github("stan-dev/posterior")

For our purposes, the function of interest is posterior::summarise_draws(), which will take the output from posterior_samples() as input.

library(posterior)

posterior_samples(b12.4) %>% 
  summarise_draws()
## # A tibble: 12 x 10
##    variable                    mean   median    sd   mad       q5       q95  rhat ess_bulk ess_tail
##    <chr>                      <dbl>    <dbl> <dbl> <dbl>    <dbl>     <dbl> <dbl>    <dbl>    <dbl>
##  1 b_Intercept                0.438    0.413 0.971 0.829   -1.05     2.03    1.00    3298.    4578.
##  2 b_prosoc_left              0.825    0.825 0.261 0.258    0.398    1.25    1.00    8076.    8829.
##  3 b_prosoc_left:condition   -0.136   -0.131 0.296 0.297   -0.623    0.344   1.00    7684.    8893.
##  4 sd_actor__Intercept        2.26     2.06  0.918 0.718    1.22     3.99    1.00    3516.    6096.
##  5 r_actor[1,Intercept]      -1.16    -1.13  0.986 0.856   -2.77     0.359   1.00    3389.    4839.
##  6 r_actor[2,Intercept]       4.17     3.90  1.65  1.35     2.10     7.17    1.00    5672.    6378.
##  7 r_actor[3,Intercept]      -1.46    -1.42  0.988 0.839   -3.08     0.0482  1.00    3422.    4823.
##  8 r_actor[4,Intercept]      -1.46    -1.42  0.990 0.860   -3.07     0.0746  1.00    3415.    4721.
##  9 r_actor[5,Intercept]      -1.15    -1.13  0.989 0.855   -2.78     0.363   1.00    3388.    4684.
## 10 r_actor[6,Intercept]      -0.212   -0.186 0.988 0.859   -1.81     1.32    1.00    3362.    4836.
## 11 r_actor[7,Intercept]       1.32     1.34  1.01  0.887   -0.313    2.90    1.00    3594.    4566.
## 12 lp__                    -283.    -283.    2.83  2.75  -288.    -279.      1.00    3615.    6322.

Note how the last three columns are the rhat, the ess_bulk, and the ess_tail. Here we summarize those two effective sample size columns with histograms.

posterior_samples(b12.4) %>% 
  summarise_draws() %>% 
  select(starts_with("ess")) %>% 
  gather() %>% 
  
  ggplot(aes(x = value)) +
  geom_histogram(binwidth = 100, fill = "blue") +
  xlim(0, NA) +
  theme_fivethirtyeight() +
  facet_wrap(~key)

McElreath pointed out it’s important to understand the actor-level parameters, as in the summary above, are deviations from the grand mean. Here’s one way to add them together.

post %>%
  select(starts_with("r_actor")) %>% 
  gather() %>%
  # this is how we might add the grand mean to the actor-level deviations
  mutate(value = value + post$b_Intercept) %>% 
  group_by(key) %>%
  summarise(mean = mean(value) %>% round(digits = 2))
## # A tibble: 7 x 2
##   key                   mean
##   <chr>                <dbl>
## 1 r_actor[1,Intercept] -0.72
## 2 r_actor[2,Intercept]  4.6 
## 3 r_actor[3,Intercept] -1.02
## 4 r_actor[4,Intercept] -1.02
## 5 r_actor[5,Intercept] -0.71
## 6 r_actor[6,Intercept]  0.23
## 7 r_actor[7,Intercept]  1.76

Here’s another way to get at the same information, this time using coef() and a little formatting help from the stringr::str_c() function. Just for kicks, we’ll throw in the 95% intervals, too.

coef(b12.4)$actor[, c(1, 3:4), 1] %>%
  as_tibble() %>% 
  round(digits = 2) %>%
  # here we put the credible intervals in an APA-6-style format
  mutate(`95% CIs` = str_c("[", Q2.5, ", ", Q97.5, "]"),
         actor     = str_c("chimp #", 1:7)) %>%
  rename(mean = Estimate) %>%
  select(actor, mean, `95% CIs`) %>% 
  knitr::kable()
actor mean 95% CIs
chimp #1 -0.72 [-1.26, -0.2]
chimp #2 4.60 [2.55, 8.44]
chimp #3 -1.02 [-1.59, -0.48]
chimp #4 -1.02 [-1.57, -0.49]
chimp #5 -0.71 [-1.25, -0.19]
chimp #6 0.23 [-0.3, 0.76]
chimp #7 1.76 [1.05, 2.56]

If you prefer the posterior median to the mean, just add a robust = T argument inside the coef() function.

12.3.2 Two types of cluster.

The full cross-classified statistical model follows the form

\[\begin{align*} \text{left_pull}_i & \sim \operatorname{Binomial}(n_i = 1, p_i) \\ \operatorname{logit}(p_i) & = \alpha + \alpha_{\text{actor}_i} + \color{blue}{\alpha_{\text{block}_i}} + (\beta_1 + \beta_2 \text{condition}_i) \text{prosoc_left}_i \\ \alpha_{\text{actor}} & \sim \operatorname{Normal}(0, \sigma_{\text{actor}}) \\ \color{blue}{\alpha_{\text{block}}} & \color{blue}\sim \color{blue}{\operatorname{Normal}(0, \sigma_{\text{actor}})} \\ \alpha & \sim \operatorname{Normal}(0, 10) \\ \beta_1 & \sim \operatorname{Normal}(0, 10) \\ \beta_2 & \sim \operatorname{Normal}(0, 10) \\ \sigma_{\text{actor}} & \sim \operatorname{HalfCauchy}(0, 1) \\ \color{blue}{\sigma_{\text{block}}} & \color{blue}\sim \color{blue}{\operatorname{HalfCauchy}(0, 1)}. \end{align*}\]

Our brms model with varying intercepts for both actor and block now employs the ... (1 | actor) + (1 | block) syntax.

b12.5 <- 
  update(b12.4,
         newdata = d,
         formula = pulled_left | trials(1) ~ 1 + prosoc_left + prosoc_left:condition + 
           (1 | actor) + (1 | block),
         iter = 6000, warmup = 1000, cores = 4, chains = 4, 
         control = list(adapt_delta = .9),
         seed = 12,
         file = "fits/b12.05")

This time we increased the adapt_delta parameter to .9 to avoid divergent transitions. We can look at the primary coefficients with print(). McElreath urged us again to inspect the trace plots.

post <- posterior_samples(b12.5, add_chain = T)

post %>% 
  select(-lp__, -iter) %>% 
  mcmc_trace(facet_args = list(ncol = 4))+
  scale_x_continuous(breaks = c(0, 2500, 5000)) +
  theme_fivethirtyeight() +
  theme(legend.position = c(.75, .06))

The trace plots look great. Here are the other diagnostics along with the parameter summaries, once again using posterior::summarise_draws().

posterior_samples(b12.5) %>% 
  summarise_draws() %>% 
  select(-median, -mad) %>% 
  mutate_if(is.double, round, digits = 2)
## # A tibble: 19 x 8
##    variable                   mean    sd       q5     q95  rhat ess_bulk ess_tail
##    <chr>                     <dbl> <dbl>    <dbl>   <dbl> <dbl>    <dbl>    <dbl>
##  1 b_Intercept                0.42  0.98   -1.07     2.02     1    4870.    6337.
##  2 b_prosoc_left              0.83  0.26    0.4      1.26     1   16831.   13420 
##  3 b_prosoc_left:condition   -0.14  0.3    -0.63     0.35     1   17292.   14274.
##  4 sd_actor__Intercept        2.27  0.92    1.23     3.97     1    5313.    7345.
##  5 sd_block__Intercept        0.22  0.18    0.02     0.56     1    6444.    7694.
##  6 r_actor[1,Intercept]      -1.14  0.99   -2.75     0.38     1    4984.    6789.
##  7 r_actor[2,Intercept]       4.22  1.68    2.12     7.2      1    8561.   10305.
##  8 r_actor[3,Intercept]      -1.44  0.99   -3.06     0.06     1    4998.    6469.
##  9 r_actor[4,Intercept]      -1.44  0.99   -3.07     0.07     1    4985.    6364.
## 10 r_actor[5,Intercept]      -1.13  0.99   -2.77     0.36     1    4954.    6732.
## 11 r_actor[6,Intercept]      -0.19  0.99   -1.8      1.31     1    4927.    6435.
## 12 r_actor[7,Intercept]       1.36  1.01   -0.27     2.94     1    5167.    6777.
## 13 r_block[1,Intercept]      -0.18  0.23   -0.63     0.08     1   10896.   15636.
## 14 r_block[2,Intercept]       0.04  0.19   -0.24     0.36     1   19606.   16900.
## 15 r_block[3,Intercept]       0.05  0.19   -0.22     0.4      1   18098.   16705.
## 16 r_block[4,Intercept]       0.01  0.18   -0.290    0.31     1   18597.   16831.
## 17 r_block[5,Intercept]      -0.03  0.18   -0.35     0.24     1   18333    17668.
## 18 r_block[6,Intercept]       0.11  0.2    -0.14     0.49     1   14348.   16673.
## 19 lp__                    -293.    3.78 -300.    -287.       1    4989.    9170.

All our \(\widehat R\) are excellent and the values in both our ess_. columns are generally quite higher than the n_eff values McElreath presented in the text.

If we’d like to restrict our focus to the parameter summaries, we can also use brms::ranef() to get those depth=2-type estimates. With ranef(), you get the group-specific estimates in a deviance metric. The coef() function, in contrast, yields the group-specific estimates in what you might call the natural metric. We’ll get more language for this in the next chapter.

ranef(b12.5)$actor[, , "Intercept"] %>% 
  round(digits = 2)
##   Estimate Est.Error  Q2.5 Q97.5
## 1    -1.14      0.99 -3.22  0.75
## 2     4.22      1.68  1.81  8.19
## 3    -1.44      0.99 -3.53  0.47
## 4    -1.44      0.99 -3.54  0.43
## 5    -1.13      0.99 -3.21  0.79
## 6    -0.19      0.99 -2.25  1.71
## 7     1.36      1.01 -0.72  3.33
ranef(b12.5)$block[, , "Intercept"] %>% 
  round(digits = 2)
##   Estimate Est.Error  Q2.5 Q97.5
## 1    -0.18      0.23 -0.75  0.13
## 2     0.04      0.19 -0.33  0.46
## 3     0.05      0.19 -0.31  0.50
## 4     0.01      0.18 -0.38  0.41
## 5    -0.03      0.18 -0.44  0.33
## 6     0.11      0.20 -0.21  0.60

We might make the coefficient plot of Figure 12.4.a with mcmc_plot().

mcmc_plot(b12.5, pars = c("^r_", "^b_", "^sd_")) +
  theme_fivethirtyeight() +
  theme(axis.text.y = element_text(hjust = 0))

Once we get the posterior samples, it’s easy to compare the random variances as in Figure 12.4.b.

post %>%
  ggplot(aes(x = sd_actor__Intercept)) +
  geom_density(size = 0, fill = "orange1", alpha = 3/4) +
  geom_density(aes(x = sd_block__Intercept), 
               size = 0, fill = "orange4", alpha = 3/4)  +
  annotate(geom = "text", x = 2/3, y = 2, label = "block", color = "orange4") +
  annotate(geom = "text", x = 2, y = 3/4, label = "actor", color = "orange1") +
  scale_y_continuous(NULL, breaks = NULL) +
  ggtitle(expression(sigma["<group>"])) +
  coord_cartesian(xlim = c(0, 4)) +
  theme_fivethirtyeight()

We might compare our models by their PSIS-LOO values.

b12.4 <- add_criterion(b12.4, "loo")
b12.5 <- add_criterion(b12.5, "loo")

loo_compare(b12.4, b12.5) %>% 
  print(simplify = F)
##       elpd_diff se_diff elpd_loo se_elpd_loo p_loo  se_p_loo looic  se_looic
## b12.4    0.0       0.0  -265.7      9.8         8.2    0.4    531.5   19.5  
## b12.5   -0.6       0.9  -266.4      9.8        10.4    0.5    532.7   19.7
model_weights(b12.4, b12.5, weights = "loo") %>% 
  round(digits = 2)
## b12.4 b12.5 
##  0.65  0.35

The two models yield nearly-equivalent information criteria values. Yet recall what McElreath wrote: “There is nothing to gain here by selecting either model. The comparison of the two models tells a richer story” (p. 367).

12.3.3 Even more clusters.

Adding more types of clusters proceeds the same way. At some point the model may become too complex to reliably fit to data. But Hamiltonian Monte Carlo is very capable with varying effects. It can easily handle tens of thousands of varying effect parameters. Sampling will be slow in such cases, but it will work.

So don’t be shy–if you have a good theoretical reason to include a cluster variable, then you also have a good theoretical reason toe partially pool its parameters. (p. 376)

This section just hints at a historical software difficulty. In short, it’s not uncommon to have a theory-based model that includes multiple sources of clustering (i.e., requiring many ( <varying parameter(s)> | <grouping variable(s)> ) parts in the model formula). This can make for all kinds of computational difficulties and result in software error messages, inadmissible solutions, and so on. One of the practical solutions to difficulties like these has been to simplify the statistical models by removing some of the clustering terms. Even though such simpler models were not the theory-based ones, at least they yielded solutions. Nowadays, Stan (via brms or otherwise) is making it easier to fit the full theoretically-based model. To learn more about this topic, check out this nice blog post by Michael Frank, Mixed effects models: Is it time to go Bayesian by default?. Make sure to check out the discussion in the comments section, which includes all-stars like Bürkner and Douglas Bates. You can get more context for the issue from Barr et al. (2013), Random effects structure for confirmatory hypothesis testing: Keep it maximal.

12.4 Multilevel posterior predictions

Producing implied predictions from a fit model, is very helpful for understanding what the model means. Every model is a merger of sense and nonsense. When we understand a model, we can find its sense and control its nonsense. But as models get more complex, it is very difficult to impossible to understand them just by inspecting tables of posterior means and intervals. Exploring implied posterior predictions helps much more…

…The introduction of varying effects does introduce nuance, however.

First, we should no longer expect the model to exactly retrodict the sample, because adaptive regularization has as its goal to trade off poorer fit in sample for better inference and hopefully better fit out of sample. This is what shrinkage does for us…

Second, “prediction” in the context of a multilevel model requires additional choices. If we wish to validate a model against the specific clusters used to fit the model, that is one thing. But if we instead wish to compute predictions for new clusters, other than the one observed in the sample, that is quite another. We’ll consider each of these in turn, continuing to use the chimpanzees model from the previous section. (p. 376)

12.4.1 Posterior prediction for same clusters.

Like McElreath did in the text, we’ll do this two ways. Recall we use brms::fitted() in place of rethinking::link().

chimp <- 2
nd <-
  tibble(prosoc_left = c(0, 1, 0, 1),
         condition   = c(0, 0, 1, 1),
         actor       = chimp)

(
  chimp_2_fitted <-
  fitted(b12.4,
         newdata = nd) %>% 
  as_tibble() %>% 
  mutate(condition = factor(c("0/0", "1/0", "0/1", "1/1"), 
                            levels = c("0/0", "1/0", "0/1", "1/1")))
)
## # A tibble: 4 x 5
##   Estimate Est.Error  Q2.5 Q97.5 condition
##      <dbl>     <dbl> <dbl> <dbl> <fct>    
## 1    0.981   0.0198  0.928  1.00 0/0      
## 2    0.991   0.00956 0.965  1.00 1/0      
## 3    0.981   0.0198  0.928  1.00 0/1      
## 4    0.990   0.0108  0.961  1.00 1/1
(
  chimp_2_d <-
  d %>% 
  filter(actor == chimp) %>% 
  group_by(prosoc_left, condition) %>% 
  summarise(prob = mean(pulled_left)) %>% 
  ungroup() %>% 
  mutate(condition = str_c(prosoc_left, "/", condition)) %>% 
  mutate(condition = factor(condition, levels = c("0/0", "1/0", "0/1", "1/1")))
)
## # A tibble: 4 x 3
##   prosoc_left condition  prob
##         <int> <fct>     <dbl>
## 1           0 0/0           1
## 2           0 0/1           1
## 3           1 1/0           1
## 4           1 1/1           1

McElreath didn’t show the corresponding plot in the text. It might look like this.

chimp_2_fitted %>%
  # if you want to use `geom_line()` or `geom_ribbon()` with a factor on the x axis,
  # you need to code something like `group = 1` in `aes()`
  ggplot(aes(x = condition, y = Estimate, group = 1)) +
  geom_ribbon(aes(ymin = Q2.5, ymax = Q97.5), fill = "orange1") +
  geom_line(color = "blue") +
  geom_point(data = chimp_2_d,
             aes(y = prob),
             color = "grey25") +
  ggtitle("Chimp #2",
          subtitle = "The posterior mean and 95%\nintervals are the blue line\nand orange band, respectively.\nThe empirical means are\nthe charcoal dots.") +
  coord_cartesian(ylim = c(.75, 1)) +
  theme_fivethirtyeight() +
  theme(plot.subtitle = element_text(size = 10))

Do note how severely we’ve restricted the \(y\)-axis range. But okay, now let’s do things by hand. We’ll need to extract the posterior samples and look at the structure of the data.

post <- posterior_samples(b12.4)

glimpse(post)
## Rows: 16,000
## Columns: 12
## $ b_Intercept               <dbl> 1.00295363, 0.31106993, 0.29521392, -0.25970372, 0.20508343, 0.…
## $ b_prosoc_left             <dbl> 0.9754696, 0.7220658, 0.7240192, 0.8146748, 1.0170116, 0.686453…
## $ `b_prosoc_left:condition` <dbl> -0.230580579, 0.032554089, 0.003065856, -0.235881824, -0.387184…
## $ sd_actor__Intercept       <dbl> 1.626829, 1.802634, 1.997876, 1.841644, 2.329360, 2.216993, 1.7…
## $ `r_actor[1,Intercept]`    <dbl> -1.78022727, -0.57940259, -1.04798256, -0.29633218, -1.05559286…
## $ `r_actor[2,Intercept]`    <dbl> 3.073942, 4.490928, 5.321185, 4.815112, 4.769627, 2.325157, 3.4…
## $ `r_actor[3,Intercept]`    <dbl> -1.94997639, -1.09127852, -1.30254479, -0.95552306, -1.33635037…
## $ `r_actor[4,Intercept]`    <dbl> -1.64232918, -1.54879556, -1.76511324, -0.94974999, -0.94615187…
## $ `r_actor[5,Intercept]`    <dbl> -1.93393022, -1.16353954, -1.01552958, -0.62955070, -0.47598387…
## $ `r_actor[6,Intercept]`    <dbl> -1.21562447, -0.10987000, -0.29289480, 0.63880693, -0.32259245,…
## $ `r_actor[7,Intercept]`    <dbl> 0.9219234, 1.2016245, 1.1336787, 1.6140226, 1.6022392, 0.908353…
## $ lp__                      <dbl> -282.9973, -281.6189, -281.3148, -280.8570, -281.3620, -287.530…

McElreath didn’t show what his R code 12.29 dens( post$a_actor[,5] ) would look like. But here’s our analogue.

post %>%
  transmute(actor_5 = `r_actor[5,Intercept]`) %>% 
  
  ggplot(aes(x = actor_5)) +
  geom_density(size = 0, fill = "blue") +
  scale_y_continuous(breaks = NULL) +
  ggtitle("Chimp #5's density") +
  theme_fivethirtyeight()

And because we made the density only using the r_actor[5,Intercept] values (i.e., we didn’t add b_Intercept to them), the density is in a deviance-score metric.

McElreath built his own link() function in R code 12.30. Here we’ll build an alternative to fitted().

# our hand-made `brms::fitted()` alternative
my_fitted <- function(prosoc_left, condition) {
  post %>%
    transmute(fitted = (b_Intercept + 
                          `r_actor[5,Intercept]` + 
                          b_prosoc_left * prosoc_left + 
                          `b_prosoc_left:condition` * prosoc_left * condition) %>% 
                inv_logit_scaled())
}

# the posterior summaries
(
  chimp_5_my_fitted <-
  tibble(prosoc_left = c(0, 1, 0, 1),
         condition   = c(0, 0, 1, 1)) %>% 
  mutate(post = map2(prosoc_left, condition, my_fitted)) %>% 
  unnest(post) %>% 
  mutate(condition = factor(str_c(prosoc_left, "/", condition), 
                            levels = c("0/0", "1/0", "0/1", "1/1"))) %>% 
  group_by(condition) %>% 
  tidybayes::mean_qi(fitted)
)
## # A tibble: 4 x 7
##   condition fitted .lower .upper .width .point .interval
##   <fct>      <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 0/0        0.331  0.222  0.453   0.95 mean   qi       
## 2 1/0        0.527  0.382  0.672   0.95 mean   qi       
## 3 0/1        0.331  0.222  0.453   0.95 mean   qi       
## 4 1/1        0.494  0.349  0.640   0.95 mean   qi
# the empirical summaries
chimp <- 5
(
  chimp_5_d <-
  d %>% 
  filter(actor == chimp) %>% 
  group_by(prosoc_left, condition) %>% 
  summarise(prob = mean(pulled_left)) %>% 
  ungroup() %>% 
  mutate(condition = factor(str_c(prosoc_left, "/", condition), 
                            levels = c("0/0", "1/0", "0/1", "1/1")))
)
## # A tibble: 4 x 3
##   prosoc_left condition  prob
##         <int> <fct>     <dbl>
## 1           0 0/0       0.333
## 2           0 0/1       0.278
## 3           1 1/0       0.556
## 4           1 1/1       0.5

Okay, let’s see how good we are at retrodicting the pulled_left probabilities for actor == 5.

chimp_5_my_fitted %>%
  ggplot(aes(x = condition, y = fitted, group = 1)) +
  geom_ribbon(aes(ymin = .lower, ymax = .upper), fill = "orange1") +
  geom_line(color = "blue") +
  geom_point(data = chimp_5_d,
             aes(y = prob),
             color = "grey25") +
  ggtitle("Chimp #5",
          subtitle = "This plot is like the last except\nwe did more by hand.") +
  coord_cartesian(ylim = c(0, 1)) +
  theme_fivethirtyeight() +
  theme(plot.subtitle = element_text(size = 10))

Not bad.

12.4.2 Posterior prediction for new clusters.

By average actor, McElreath referred to a chimp with an intercept exactly at the population mean \(\alpha\). So this time we’ll only be working with the population parameters, or what are also sometimes called the fixed effects. When using brms::posterior_samples() output, this would mean working with columns beginning with the b_ prefix (i.e., b_Intercept, b_prosoc_left, and b_prosoc_left:condition).

post_average_actor <-
  post %>% 
  # here we use the linear regression formula to get the log_odds for the 4 conditions
  transmute(`0/0` = b_Intercept,
            `1/0` = b_Intercept + b_prosoc_left,
            `0/1` = b_Intercept,
            `1/1` = b_Intercept + b_prosoc_left + `b_prosoc_left:condition`) %>%
  # with `mutate_all()` we can convert the estimates to probabilities in one fell swoop
  mutate_all(inv_logit_scaled) %>% 
  # putting the data in the long format and grouping by condition (i.e., `key`)
  gather() %>%
  mutate(key = factor(key, level = c("0/0", "1/0", "0/1", "1/1"))) %>% 
  group_by(key) %>%
  # here we get the summary values for the plot
  summarise(m  = mean(value),
            # note we're using 80% intervals
            ll = quantile(value, probs = .1),
            ul = quantile(value, probs = .9))

post_average_actor
## # A tibble: 4 x 4
##   key       m    ll    ul
##   <fct> <dbl> <dbl> <dbl>
## 1 0/0   0.591 0.338 0.829
## 2 1/0   0.745 0.532 0.919
## 3 0/1   0.591 0.338 0.829
## 4 1/1   0.723 0.500 0.908

Figure 12.5.a.

p1 <-
  post_average_actor %>%
  ggplot(aes(x = key, y = m, group = 1)) +
  geom_ribbon(aes(ymin = ll, ymax = ul), fill = "orange1") +
  geom_line(color = "blue") +
  ggtitle("Average actor") +
  coord_cartesian(ylim = c(0, 1)) +
  theme_fivethirtyeight() +
  theme(plot.title = element_text(size = 14, hjust = .5))

p1

If we want to depict the variability across the chimps, we need to include sd_actor__Intercept into the calculations. In the first block of code, below, we simulate a bundle of new intercepts defined by

\[\alpha_\text{actor} \sim \operatorname{Normal}(0, \sigma_\text{actor}).\]

# the random effects
set.seed(12.42)
ran_ef <-
  tibble(random_effect = rnorm(n = 1000, mean = 0, sd = post$sd_actor__Intercept)) %>% 
  # with the `., ., ., .` syntax, we quadruple the previous line 
  bind_rows(., ., ., .) 

# the fixed effects (i.e., the population parameters)
fix_ef <-
  post %>% 
  slice(1:1000) %>%
  transmute(`0/0` = b_Intercept,
            `1/0` = b_Intercept + b_prosoc_left,
            `0/1` = b_Intercept,
            `1/1` = b_Intercept + b_prosoc_left + `b_prosoc_left:condition`) %>%
  gather() %>%
  rename(condition    = key, 
         fixed_effect = value) %>% 
  mutate(condition = factor(condition, level = c("0/0", "1/0", "0/1", "1/1")))

# combine them
ran_and_fix_ef <-
  bind_cols(ran_ef, fix_ef) %>%
  mutate(intercept = fixed_effect + random_effect) %>%
  mutate(prob = inv_logit_scaled(intercept))

# to simplify things, we'll reduce them to summaries
(
  marginal_effects <-
  ran_and_fix_ef %>%
  group_by(condition) %>%
  summarise(m  = mean(prob),
            ll = quantile(prob, probs = .1),
            ul = quantile(prob, probs = .9))
)
## # A tibble: 4 x 4
##   condition     m     ll    ul
##   <fct>     <dbl>  <dbl> <dbl>
## 1 0/0       0.555 0.0733 0.965
## 2 1/0       0.669 0.140  0.985
## 3 0/1       0.555 0.0733 0.965
## 4 1/1       0.651 0.140  0.983

Behold Figure 12.5.b.

p2 <-
  marginal_effects %>%
  ggplot(aes(x = condition, y = m, group = 1)) +
  geom_ribbon(aes(ymin = ll, ymax = ul), fill = "orange1") +
  geom_line(color = "blue") +
  ggtitle("Marginal of actor") +
  coord_cartesian(ylim = c(0, 1)) +
  theme_fivethirtyeight() +
  theme(plot.title = element_text(size = 14, hjust = .5))

p2

Figure 12.5.c just takes a tiny bit more wrangling.

p3 <-
  ran_and_fix_ef %>%
  mutate(iter = rep(1:1000, times = 4)) %>%
  filter(iter %in% c(1:50)) %>%
  
  ggplot(aes(x = condition, y = prob, group = iter)) +
  geom_line(alpha = 1/2, color = "orange3") +
  ggtitle("50 simulated actors") +
  coord_cartesian(ylim = c(0, 1)) +
  theme_fivethirtyeight() +
  theme(plot.title = element_text(size = 14, hjust = .5))

p3

For the finale, we’ll stitch the three plots together.

p1 | p2 | p3

12.4.2.1 Bonus: Let’s use fitted() this time.

We just made those plots using various wrangled versions of post, the data frame returned by posterior_samples(b.12.4). If you followed along closely, part of what made that a great exercise is that it forced you to consider what the various vectors in post meant with respect to the model formula. But it’s also handy to see how to do that from a different perspective. So in this section, we’ll repeat that process by relying on the fitted() function, instead. We’ll go in the same order, starting with the average actor.

nd <-
  tibble(prosoc_left = c(0, 1, 0, 1),
         condition   = c(0, 0, 1, 1))

(
  f <-
  fitted(b12.4,
         newdata = nd,
         re_formula = NA,
         probs = c(.1, .9)) %>% 
  as_tibble() %>% 
  bind_cols(nd) %>% 
  mutate(condition = factor(str_c(prosoc_left, "/", condition), 
                            levels = c("0/0", "1/0", "0/1", "1/1")))
)
## # A tibble: 4 x 6
##   Estimate Est.Error   Q10   Q90 prosoc_left condition
##      <dbl>     <dbl> <dbl> <dbl>       <dbl> <fct>    
## 1    0.591     0.188 0.338 0.829           0 0/0      
## 2    0.745     0.159 0.532 0.919           1 1/0      
## 3    0.591     0.188 0.338 0.829           0 0/1      
## 4    0.723     0.165 0.500 0.908           1 1/1

You should notice a few things. Since b12.4 is a multilevel model, it had three predictors: prosoc_left, condition, and actor. However, our nd data only included the first two of those predictors. The reason fitted() permitted that was because we set re_formula = NA. When you do that, you tell fitted() to ignore group-level effects (i.e., focus only on the fixed effects). This was our fitted() version of ignoring the r_ vectors returned by posterior_samples(). Here’s the plot.

p4 <-
  f %>%
  ggplot(aes(x = condition, y = Estimate, group = 1)) +
  geom_ribbon(aes(ymin = Q10, ymax = Q90), fill = "blue") +
  geom_line(color = "orange1") +
  ggtitle("Average actor") +
  coord_cartesian(ylim = c(0, 1)) +
  theme_fivethirtyeight() +
  theme(plot.title = element_text(size = 14, hjust = .5))

p4

For marginal of actor, we can continue using the same nd data. This time we’ll be sticking with the default re_formula setting, which will accommodate the multilevel nature of the model. However, we’ll also be adding allow_new_levels = T and sample_new_levels = "gaussian". The former will allow us to marginalize across the specific actors in our data and the latter will instruct fitted() to use the multivariate normal distribution implied by the random effects. It’ll make more sense why I say multivariate normal by the end of the next chapter. For now, just go with it.

(
  f <-
  fitted(b12.4,
         newdata = nd,
         probs = c(.1, .9),
         allow_new_levels = T,
         sample_new_levels = "gaussian") %>% 
  as_tibble() %>% 
  bind_cols(nd) %>% 
  mutate(condition = factor(str_c(prosoc_left, "/", condition), 
                            levels = c("0/0", "1/0", "0/1", "1/1")))
  )
## # A tibble: 4 x 6
##   Estimate Est.Error    Q10   Q90 prosoc_left condition
##      <dbl>     <dbl>  <dbl> <dbl>       <dbl> <fct>    
## 1    0.560     0.330 0.0703 0.971           0 0/0      
## 2    0.669     0.311 0.150  0.988           1 1/0      
## 3    0.560     0.330 0.0703 0.971           0 0/1      
## 4    0.651     0.315 0.130  0.986           1 1/1

Here’s our fitted()-based marginal of actor plot.

p5 <-
  f %>%
  ggplot(aes(x = condition, y = Estimate, group = 1)) +
  geom_ribbon(aes(ymin = Q10, ymax = Q90), fill = "blue") +
  geom_line(color = "orange1") +
  ggtitle("Marginal of actor") +
  coord_cartesian(ylim = c(0, 1)) +
  theme_fivethirtyeight() +
  theme(plot.title = element_text(size = 14, hjust = .5))

p5

For the simulated actors plot, we’ll just amend our process from the last one. This time we set summary = F to keep the iteration-specific results and set nsamples = n_sim to indicate the number of actors we’d like to simulate (i.e., 50, as in the text).

# how many simulated actors would you like?
n_sim <- 50

(
  f <-
  fitted(b12.4,
         newdata = nd,
         probs = c(.1, .9),
         allow_new_levels = T,
         sample_new_levels = "gaussian",
         summary = F,
         nsamples = n_sim) %>% 
  as_tibble() %>% 
  mutate(iter = 1:n_sim) %>% 
  gather(key, value, -iter) %>% 
  bind_cols(nd %>% 
              transmute(condition = factor(str_c(prosoc_left, "/", condition), 
                                           levels = c("0/0", "1/0", "0/1", "1/1"))) %>% 
              expand(condition, iter = 1:n_sim) %>% 
              select(-iter))
)
## # A tibble: 200 x 4
##     iter key   value condition
##    <int> <chr> <dbl> <fct>    
##  1     1 V1    0.228 0/0      
##  2     2 V1    0.562 0/0      
##  3     3 V1    0.507 0/0      
##  4     4 V1    0.644 0/0      
##  5     5 V1    0.812 0/0      
##  6     6 V1    0.753 0/0      
##  7     7 V1    0.347 0/0      
##  8     8 V1    0.398 0/0      
##  9     9 V1    0.886 0/0      
## 10    10 V1    0.774 0/0      
## # … with 190 more rows
p6 <-
  f %>%
  
  ggplot(aes(x = condition, y = value, group = iter)) +
  geom_line(alpha = 1/2, color = "blue") +
  ggtitle("50 simulated actors") +
  coord_cartesian(ylim = c(0, 1)) +
  theme_fivethirtyeight() +
  theme(plot.title = element_text(size = 14, hjust = .5))

p6

Here they are altogether.

p4 | p5 | p6

12.4.3 Focus and multilevel prediction.

First, let’s load the Kline data.

# prep data
library(rethinking)
data(Kline)
d <- Kline

Switch out the packages, once again.

detach(package:rethinking, unload = T)
library(brms)
rm(Kline)

The statistical formula for our multilevel count model is

\[\begin{align*} \text{total_tools}_i & \sim \operatorname{Poisson}(\mu_i) \\ \log (\mu_i) & = \alpha + \color{blue}{\alpha_{\text{culture}_i}} + \beta \log (\text{population}_i) \\ \alpha & \sim \operatorname{Normal}(0, 10) \\ \beta & \sim \operatorname{Normal}(0, 1) \\ \color{blue}{\alpha_\text{culture}} & \color{blue}\sim \color{blue}{\operatorname{Normal}(0, \sigma_{\text{culture}})} \\ \sigma_\text{culture} & \sim \operatorname{HalfCauchy}(0, 1). \\ \end{align*}\]

With brms, we don’t actually need to make the logpop or society variables. We’re ready to fit the multilevel Kline model with the data in hand.

b12.6 <- 
  brm(data = d, 
      family = poisson,
      total_tools ~ 0 + Intercept + log(population) + 
        (1 | culture),
      prior = c(prior(normal(0, 10), class = b, coef = Intercept),
                prior(normal(0, 1), class = b),
                prior(cauchy(0, 1), class = sd)),
      iter = 4000, warmup = 1000, cores = 3, chains = 3,
      seed = 12,
      file = "fits/b12.06")

Note how we used the special 0 + Intercept syntax rather than using the default Intercept. This is because our predictor variable was not mean centered. For more info, see brms GitHub issue #114. Though we used the 0 + Intercept syntax for the fixed effect, it was not necessary for the random effect. Both ways work.

Here is the data-processing work for our variant of Figure 12.6.

nd <- 
  tibble(population = seq(from = 1000, to = 400000, by = 5000),
         # to "simulate counterfactual societies, using the hyper-parameters" (p. 383), 
         # we'll plug a new island into the `culture` variable
         culture    = "my_island") 

p <-
  predict(b12.6,
          # this allows us to simulate values for our counterfactual island, "my_island"
          allow_new_levels = T,
          # here we explicitly tell brms we want to include the group-level effects
          re_formula = ~ (1 | culture),
          # from the brms manual, this uses the "(multivariate) normal distribution implied by 
          # the group-level standard deviations and correlations", which appears to be 
          # what McElreath did in the text.
          sample_new_levels = "gaussian",
          newdata = nd,
          probs = c(.015, .055, .165, .835, .945, .985)) %>%
  as_tibble() %>%
  bind_cols(nd)

p %>%  
  glimpse()
## Rows: 80
## Columns: 10
## $ Estimate   <dbl> 19.84667, 31.25389, 36.59878, 40.37567, 43.47167, 45.93311, 48.17656, 50.31911…
## $ Est.Error  <dbl> 9.717469, 13.363663, 15.535735, 17.257175, 18.722717, 19.943040, 21.096044, 22…
## $ Q1.5       <dbl> 5.000, 11.000, 12.000, 14.000, 14.985, 16.000, 17.000, 17.000, 18.000, 18.000,…
## $ Q5.5       <dbl> 8, 15, 17, 19, 21, 22, 23, 24, 25, 26, 26, 27, 27, 28, 28, 29, 29, 29, 30, 30,…
## $ Q16.5      <dbl> 12, 20, 24, 26, 28, 30, 31, 32, 33, 34, 35, 36, 37, 37, 38, 39, 39, 40, 40, 41…
## $ Q83.5      <dbl> 27.000, 42.000, 48.000, 53.000, 58.000, 60.000, 64.000, 66.000, 69.000, 71.000…
## $ Q94.5      <dbl> 36, 53, 62, 68, 74, 78, 82, 86, 90, 92, 96, 98, 101, 104, 106, 107, 111, 112, …
## $ Q98.5      <dbl> 48.000, 71.000, 83.000, 90.000, 97.000, 106.015, 109.015, 116.000, 122.000, 12…
## $ population <dbl> 1000, 6000, 11000, 16000, 21000, 26000, 31000, 36000, 41000, 46000, 51000, 560…
## $ culture    <chr> "my_island", "my_island", "my_island", "my_island", "my_island", "my_island", …

For a detailed discussion on this way of using brms::predict(), see Andrew MacDonald’s great blogpost on this very figure. Here’s what we’ve been working for.

p %>%
  ggplot(aes(x = log(population), y = Estimate)) +
  geom_ribbon(aes(ymin = Q1.5,  ymax = Q98.5), fill = "orange2", alpha = 1/3) +
  geom_ribbon(aes(ymin = Q5.5,  ymax = Q94.5), fill = "orange2", alpha = 1/3) +
  geom_ribbon(aes(ymin = Q16.5, ymax = Q83.5), fill = "orange2", alpha = 1/3) +
  geom_line(color = "orange4") +
  geom_text(data = d, 
            aes(y = total_tools, label = culture), 
            size = 2.33, color = "blue") +
  ggtitle("Total tools as a function of log(population)") +
  coord_cartesian(ylim = range(d$total_tools)) +
  theme_fivethirtyeight() +
  theme(plot.title = element_text(size = 12, hjust = .5))

Glorious.

The envelope of predictions is a lot wider here than it was back in Chapter 10. This is a consequene of the varying intercepts, combined with the fact that there is much more variation in the data than a pure-Poisson model anticipates. (p. 384)

12.5 Summary Bonus: Put your random effects to work

A big part of this chapter, both what McElreath focused on in the text and even our plotting digression a bit above, focused on how to combine the fixed effects of a multilevel with the group-level. Given some binomial variable, \(\text{criterion}\), and some group term, \(\text{grouping variable}\), we’ve learned the simple multilevel model follows a form like

\[\begin{align*} \text{criterion}_i & \sim \operatorname{Binomial}(n_i \geq 1, p_i) \\ \operatorname{logit}(p_i) & = \alpha + \alpha_{\text{<grouping variable>}_i}\\ \alpha & \sim \operatorname{Normal}(0, 1) \\ \alpha_{\text{<grouping variable>}} & \sim \operatorname{Normal}(0, \sigma_{\text{<grouping variable>}}) \\ \sigma_{\text{<grouping variable>}} & \sim \operatorname{HalfCauchy}(0, 1), \end{align*}\]

and we’ve been grappling with the relation between the grand mean \(\alpha\) and the group-level deviations \(\alpha_{\text{<grouping variable>}}\). For situations where we have the brms::brm() model fit in hand, we’ve been playing with various ways to use the iterations, particularly with either the posterior_samples() method and the fitted()/predict() method. Both are great. But (a) we have other options, which I’d like to share, and (b) if you’re like me, you probably need more practice than following along with the examples in the text. In this bonus section, we are going to introduce two simplified models and then practice working with combining the grand mean various combinations of the random effects.

For our first step, we’ll introduce the models.

12.5.1 Intercepts-only models with one or two grouping variables.

If you recall, b12.4 was our first multilevel model with the chimps data. We can retrieve the model formula like so.

b12.4$formula
## pulled_left | trials(1) ~ 1 + prosoc_left + prosoc_left:condition + (1 | actor)

In addition to the model intercept and random effects for the individual chimps (i.e., actor), we also included fixed effects for the study conditions. For our bonus section, it’ll be easier if we reduce this to a simple intercepts-only model with the sole actor grouping factor. That model will follow the form

\[\begin{align*} \text{pulled_left}_i & \sim \operatorname{Binomial}(n_i = 1, p_i) \\ \operatorname{logit}(p_i) & = \alpha + \alpha_{\text{actor}_i}\\ \alpha & \sim \operatorname{Normal}(0, 10) \\ \alpha_\text{actor} & \sim \operatorname{Normal}(0, \sigma_{\text{actor}}) \\ \sigma_\text{actor} & \sim \operatorname{HalfCauchy}(0, 1). \end{align*}\]

Before we fit the model, you might recall that (a) we’ve already removed the chimpanzees data after saving the data as d and (b) we subsequently reassigned the Kline data to d. Instead of reloading the rethinking package to retrieve the chimpanzees data, we might also acknowledge that the data has also been saved within our b12.4 fit object. [It’s easy to forget such things.]

b12.4$data %>% 
  glimpse()
## Rows: 504
## Columns: 4
## $ pulled_left <int> 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1,…
## $ prosoc_left <int> 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0,…
## $ condition   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ actor       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…

So there’s no need to reload anything. Everything we need is already at hand. Let’s fit the intercepts-only model.

b12.7 <- 
  brm(data = b12.4$data, 
      family = binomial,
      pulled_left | trials(1) ~ 1 + (1 | actor),
      prior = c(prior(normal(0, 10), class = Intercept),
                prior(cauchy(0, 1), class = sd)),
      iter = 5000, warmup = 1000, chains = 4, cores = 4,
      control = list(adapt_delta = .95),
      seed = 12,
      file = "fits/b12.07")

Here’s the model summary.

print(b12.7)
##  Family: binomial 
##   Links: mu = logit 
## Formula: pulled_left | trials(1) ~ 1 + (1 | actor) 
##    Data: b12.4$data (Number of observations: 504) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~actor (Number of levels: 7) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     2.22      0.94     1.10     4.53 1.00     2494     3255
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.81      0.90    -0.87     2.73 1.00     2351     3004
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Now recall that our competing cross-classified model, b12.5, added random effects for the trial blocks. Here was that formula.

b12.5$formula
## pulled_left | trials(1) ~ prosoc_left + (1 | actor) + (1 | block) + prosoc_left:condition

And, of course, we can retrieve the data from that model, too.

b12.5$data %>% 
  glimpse()
## Rows: 504
## Columns: 5
## $ pulled_left <int> 0, 1, 0, 0, 1, 1, 0, 0, 0, 0, 1, 0, 1, 1, 0, 1, 0, 0, 1, 1, 1, 0, 1, 0, 0, 1,…
## $ prosoc_left <int> 0, 0, 1, 0, 1, 1, 1, 1, 0, 0, 0, 1, 0, 1, 0, 1, 1, 0, 1, 0, 0, 0, 1, 1, 0, 0,…
## $ condition   <int> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,…
## $ actor       <int> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,…
## $ block       <int> 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 5, 5,…

It’s the same data we used from the b12.4 model, but with the addition of the block index. With those data in hand, we can fit the intercepts-only version of our cross-classified model. This model formula follows the form

\[\begin{align*} \text{pulled_left}_i & \sim \operatorname{Binomial}(n_i = 1, p_i) \\ \operatorname{logit}(p_i) & = \alpha + \alpha_{\text{actor}_i} + \alpha_{\text{block}_i}\\ \alpha & \sim \operatorname{Normal}(0, 10) \\ \alpha_\text{actor} & \sim \operatorname{Normal}(0, \sigma_\text{actor}) \\ \alpha_\text{block} & \sim \operatorname{Normal}(0, \sigma_\text{block}) \\ \sigma_\text{actor} & \sim \operatorname{HalfCauchy}(0, 1) \\ \sigma_\text{block} & \sim \operatorname{HalfCauchy}(0, 1). \end{align*}\]

Fit the model.

b12.8 <- 
  brm(data = b12.5$data, 
      family = binomial,
      pulled_left | trials(1) ~ 1 + (1 | actor) + (1 | block),
      prior = c(prior(normal(0, 10), class = Intercept),
                prior(cauchy(0, 1), class = sd)),
      iter = 5000, warmup = 1000, chains = 4, cores = 4,
      control = list(adapt_delta = 0.95),
      seed = 12,
      file = "fits/b12.08")

Here’s the summary.

print(b12.8)
##  Family: binomial 
##   Links: mu = logit 
## Formula: pulled_left | trials(1) ~ 1 + (1 | actor) + (1 | block) 
##    Data: b12.5$data (Number of observations: 504) 
## Samples: 4 chains, each with iter = 5000; warmup = 1000; thin = 1;
##          total post-warmup samples = 16000
## 
## Group-Level Effects: 
## ~actor (Number of levels: 7) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     2.20      0.88     1.10     4.46 1.00     4563     6707
## 
## ~block (Number of levels: 6) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.22      0.18     0.01     0.65 1.00     5571     6815
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     0.79      0.90    -0.97     2.72 1.00     3406     5410
## 
## Samples were drawn using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

Now we’ve fit our two intercepts-only models, let’s get to the heart of this section. We are going to practice four methods for working with the posterior samples. Each method will revolve around a different primary function. In order, they are

  • brms::posterior_samples(),
  • brms::coef(),
  • brms::fitted(), and
  • tidybayes::spread_draws().

We’ve already had some practice with the first three, but I hope this section will make them even more clear. The tidybayes::spread_draws() method will be new, to us. I think you’ll find it’s a handy alternative.

With each of the four methods, we’ll practice three different model summaries:

  • getting the posterior draws for the actor-level estimates from the b12.7 model;
  • getting the posterior draws for the actor-level estimates from the cross-classified b12.8 model, averaging over the levels of block; and
  • getting the posterior draws for the actor-level estimates from the cross-classified b12.8 model, based on block == 1.

So to be clear, our goal is to accomplish those three tasks with four methods, each of which should yield equivalent results.

12.5.2 brms::posterior_samples().

To warm up, let’s take a look at the structure of the posterior_samples() output for the simple b12.7 model.

posterior_samples(b12.7) %>% str()
## 'data.frame':    16000 obs. of  10 variables:
##  $ b_Intercept         : num  1.3 1.18 1.32 1.19 1.2 ...
##  $ sd_actor__Intercept : num  2.16 2.63 2.59 2.81 2.93 ...
##  $ r_actor[1,Intercept]: num  -1.72 -1.31 -1.36 -1.7 -1.58 ...
##  $ r_actor[2,Intercept]: num  4.08 4.21 3.27 2.85 2.89 ...
##  $ r_actor[3,Intercept]: num  -1.55 -2.06 -1.78 -1.65 -1.5 ...
##  $ r_actor[4,Intercept]: num  -1.59 -1.98 -1.93 -2.09 -1.92 ...
##  $ r_actor[5,Intercept]: num  -1.26 -1.64 -1.64 -1.12 -1.39 ...
##  $ r_actor[6,Intercept]: num  -0.478 -1.021 -0.804 -0.592 -0.518 ...
##  $ r_actor[7,Intercept]: num  0.984 1.036 0.486 0.671 1.182 ...
##  $ lp__                : num  -282 -280 -278 -280 -278 ...

The b_Intercept vector corresponds to the \(\alpha\) term in the statistical model. The second vector, sd_actor__Intercept, corresponds to the \(\sigma_\text{actor}\) term. And the next 7 vectors beginning with the r_actor suffix are the \(\alpha_\text{actor}\) deviations from the grand mean, \(\alpha\). Thus if we wanted to get the model-implied probability for our first chimp, we’d add b_Intercept to r_actor[1,Intercept] and then take the inverse logit.

posterior_samples(b12.7) %>% 
  transmute(`chimp 1's average probability of pulling left` = (b_Intercept + `r_actor[1,Intercept]`) %>% inv_logit_scaled()) %>% 
  head()
##   chimp 1's average probability of pulling left
## 1                                     0.3955172
## 2                                     0.4673756
## 3                                     0.4892391
## 4                                     0.3757748
## 5                                     0.4069309
## 6                                     0.5105511

To complete our first task, then, of getting the posterior draws for the actor-level estimates from the b12.7 model, we can do that in bulk.

p1 <-
  posterior_samples(b12.7) %>% 
  transmute(`chimp 1's average probability of pulling left` = b_Intercept + `r_actor[1,Intercept]`,
            `chimp 2's average probability of pulling left` = b_Intercept + `r_actor[2,Intercept]`,
            `chimp 3's average probability of pulling left` = b_Intercept + `r_actor[3,Intercept]`,
            `chimp 4's average probability of pulling left` = b_Intercept + `r_actor[4,Intercept]`,
            `chimp 5's average probability of pulling left` = b_Intercept + `r_actor[5,Intercept]`,
            `chimp 6's average probability of pulling left` = b_Intercept + `r_actor[6,Intercept]`,
            `chimp 7's average probability of pulling left` = b_Intercept + `r_actor[7,Intercept]`) %>% 
  mutate_all(inv_logit_scaled)

str(p1)
## 'data.frame':    16000 obs. of  7 variables:
##  $ chimp 1's average probability of pulling left: num  0.396 0.467 0.489 0.376 0.407 ...
##  $ chimp 2's average probability of pulling left: num  0.995 0.995 0.99 0.983 0.983 ...
##  $ chimp 3's average probability of pulling left: num  0.437 0.293 0.386 0.386 0.425 ...
##  $ chimp 4's average probability of pulling left: num  0.429 0.309 0.353 0.289 0.327 ...
##  $ chimp 5's average probability of pulling left: num  0.51 0.387 0.422 0.517 0.453 ...
##  $ chimp 6's average probability of pulling left: num  0.695 0.54 0.626 0.645 0.664 ...
##  $ chimp 7's average probability of pulling left: num  0.908 0.902 0.859 0.865 0.915 ...

One of the things I really like about this method is the b_Intercept + r_actor[i,Intercept] part of the code makes it very clear, to me, how the porterior_samples() columns correspond to the statistical model, \(\operatorname{logit}(p_i) = \alpha + \alpha_{\text{actor}_i}\). This method easily extends to our next task, getting the posterior draws for the actor-level estimates from the cross-classified b12.8 model, averaging over the levels of block. In fact, other than switching out b12.7 for b12.8, the method is identical.

p2 <-
  posterior_samples(b12.8) %>% 
  transmute(`chimp 1's average probability of pulling left` = b_Intercept + `r_actor[1,Intercept]`,
            `chimp 2's average probability of pulling left` = b_Intercept + `r_actor[2,Intercept]`,
            `chimp 3's average probability of pulling left` = b_Intercept + `r_actor[3,Intercept]`,
            `chimp 4's average probability of pulling left` = b_Intercept + `r_actor[4,Intercept]`,
            `chimp 5's average probability of pulling left` = b_Intercept + `r_actor[5,Intercept]`,
            `chimp 6's average probability of pulling left` = b_Intercept + `r_actor[6,Intercept]`,
            `chimp 7's average probability of pulling left` = b_Intercept + `r_actor[7,Intercept]`) %>% 
  mutate_all(inv_logit_scaled)

str(p2)
## 'data.frame':    16000 obs. of  7 variables:
##  $ chimp 1's average probability of pulling left: num  0.55 0.407 0.314 0.39 0.334 ...
##  $ chimp 2's average probability of pulling left: num  0.999 0.959 0.982 0.971 0.993 ...
##  $ chimp 3's average probability of pulling left: num  0.389 0.416 0.387 0.42 0.363 ...
##  $ chimp 4's average probability of pulling left: num  0.376 0.417 0.314 0.429 0.35 ...
##  $ chimp 5's average probability of pulling left: num  0.424 0.516 0.391 0.447 0.437 ...
##  $ chimp 6's average probability of pulling left: num  0.704 0.703 0.599 0.633 0.571 ...
##  $ chimp 7's average probability of pulling left: num  0.938 0.846 0.871 0.874 0.768 ...

The reason we can still get away with this is because the grand mean in the b12.8 model is the grand mean across all levels of actor and block. AND it’s the case that the r_actor and r_block vectors returned by posterior_samples(b12.8) are all in deviation metrics–execute posterior_samples(b12.8) %>% glimpse() if it will help you follow along. So if we simply leave out the r_block vectors, we are ignoring the specific block-level deviations, effectively averaging over them.

Now for our third task, we’ve decided we wanted to retrieve the posterior draws for the actor-level estimates from the cross-classified b12.8 model, based on block == 1. To get the chimp-specific estimates for the first block, we simply add + r_block[1,Intercept] to the end of each formula.

p3 <-
  posterior_samples(b12.8) %>% 
  transmute(`chimp 1's average probability of pulling left` = b_Intercept + `r_actor[1,Intercept]` + `r_block[1,Intercept]`,
            `chimp 2's average probability of pulling left` = b_Intercept + `r_actor[2,Intercept]` + `r_block[1,Intercept]`,
            `chimp 3's average probability of pulling left` = b_Intercept + `r_actor[3,Intercept]` + `r_block[1,Intercept]`,
            `chimp 4's average probability of pulling left` = b_Intercept + `r_actor[4,Intercept]` + `r_block[1,Intercept]`,
            `chimp 5's average probability of pulling left` = b_Intercept + `r_actor[5,Intercept]` + `r_block[1,Intercept]`,
            `chimp 6's average probability of pulling left` = b_Intercept + `r_actor[6,Intercept]` + `r_block[1,Intercept]`,
            `chimp 7's average probability of pulling left` = b_Intercept + `r_actor[7,Intercept]` + `r_block[1,Intercept]`) %>% 
  mutate_all(inv_logit_scaled)

str(p3)
## 'data.frame':    16000 obs. of  7 variables:
##  $ chimp 1's average probability of pulling left: num  0.508 0.303 0.175 0.253 0.292 ...
##  $ chimp 2's average probability of pulling left: num  0.998 0.937 0.962 0.947 0.992 ...
##  $ chimp 3's average probability of pulling left: num  0.349 0.31 0.226 0.277 0.32 ...
##  $ chimp 4's average probability of pulling left: num  0.337 0.311 0.174 0.285 0.307 ...
##  $ chimp 5's average probability of pulling left: num  0.383 0.403 0.229 0.3 0.39 ...
##  $ chimp 6's average probability of pulling left: num  0.668 0.6 0.408 0.478 0.523 ...
##  $ chimp 7's average probability of pulling left: num  0.928 0.777 0.757 0.787 0.731 ...

Again, I like this method because of how close the wrangling code within transmute() is to the statistical model formula. I wrote a lot of code like this in my early days of working with these kinds of models, and I think the pedagogical insights were helpful. But this method has its limitations. It works fine if you’re working with some small number of groups. But that’s a lot of repetitious code and it would be utterly un-scalable to situations where you have 50 or 500 levels in your grouping variable. We need alternatives.

12.5.3 brms::coef().

First, let’s review what the coef() function returns.

coef(b12.7)
## $actor
## , , Intercept
## 
##     Estimate Est.Error       Q2.5      Q97.5
## 1 -0.3215827 0.2418201 -0.7977841  0.1504863
## 2  4.8714473 1.5607904  2.8338526  8.7685059
## 3 -0.6147350 0.2465114 -1.1034131 -0.1376989
## 4 -0.6137141 0.2479777 -1.1074224 -0.1382522
## 5 -0.3248400 0.2369545 -0.7907905  0.1345572
## 6  0.5846110 0.2464742  0.1123436  1.0720785
## 7  2.0748482 0.3728561  1.3971763  2.8527766

By default, we get the familiar summaries for mean performances for each of our seven chimps. These, of course, are in the log-odds metric and simply tacking on inv_logit_scaled() isn’t going to fully get the job done if we want posterior standard deviations. So to get things in the probability metric, we’ll want to first set summary = F in order to work directly with un-summarized samples and then wrangle quite a bit. Part of the wrangling challenge is because coef() returns a list, rather than a data frame. With that in mind, the code for our first task of getting the posterior draws for the actor-level estimates from the b12.7 model looks like so.

c1 <-
  coef(b12.7, summary = F)$actor[, , ] %>%
  as_tibble() %>% 
  gather() %>% 
  mutate(key   = str_c("chimp ", key, "'s average probability of pulling left"),
         value = inv_logit_scaled(value),
         # we need an iteration index for `spread()` to work properly
         iter  = rep(1:16000, times = 7)) %>% 
  spread(key = key, value = value)

str(c1)
## tibble [16,000 × 8] (S3: tbl_df/tbl/data.frame)
##  $ iter                                         : int [1:16000] 1 2 3 4 5 6 7 8 9 10 ...
##  $ chimp 1's average probability of pulling left: num [1:16000] 0.396 0.467 0.489 0.376 0.407 ...
##  $ chimp 2's average probability of pulling left: num [1:16000] 0.995 0.995 0.99 0.983 0.983 ...
##  $ chimp 3's average probability of pulling left: num [1:16000] 0.437 0.293 0.386 0.386 0.425 ...
##  $ chimp 4's average probability of pulling left: num [1:16000] 0.429 0.309 0.353 0.289 0.327 ...
##  $ chimp 5's average probability of pulling left: num [1:16000] 0.51 0.387 0.422 0.517 0.453 ...
##  $ chimp 6's average probability of pulling left: num [1:16000] 0.695 0.54 0.626 0.645 0.664 ...
##  $ chimp 7's average probability of pulling left: num [1:16000] 0.908 0.902 0.859 0.865 0.915 ...

So with this method, you get a little practice with three-dimensional indexing, which is a good skill to have. Now let’s extend it to our second task, getting the posterior draws for the actor-level estimates from the cross-classified b12.8 model, averaging over the levels of block.

c2 <-
  coef(b12.8, summary = F)$actor[, , ] %>%
  as_tibble() %>% 
  gather() %>% 
  mutate(key   = str_c("chimp ", key, "'s average probability of pulling left"),
         value = inv_logit_scaled(value),
         iter  = rep(1:16000, times = 7)) %>% 
  spread(key = key, value = value)

str(c2)
## tibble [16,000 × 8] (S3: tbl_df/tbl/data.frame)
##  $ iter                                         : int [1:16000] 1 2 3 4 5 6 7 8 9 10 ...
##  $ chimp 1's average probability of pulling left: num [1:16000] 0.55 0.407 0.314 0.39 0.334 ...
##  $ chimp 2's average probability of pulling left: num [1:16000] 0.999 0.959 0.982 0.971 0.993 ...
##  $ chimp 3's average probability of pulling left: num [1:16000] 0.389 0.416 0.387 0.42 0.363 ...
##  $ chimp 4's average probability of pulling left: num [1:16000] 0.376 0.417 0.314 0.429 0.35 ...
##  $ chimp 5's average probability of pulling left: num [1:16000] 0.424 0.516 0.391 0.447 0.437 ...
##  $ chimp 6's average probability of pulling left: num [1:16000] 0.704 0.703 0.599 0.633 0.571 ...
##  $ chimp 7's average probability of pulling left: num [1:16000] 0.938 0.846 0.871 0.874 0.768 ...

As with our posterior_samples() method, this code was near identical to the block, above. All we did was switch out b12.7 for b12.8. [Okay, we removed a line of annotations. But that doesn’t really count.] We should point something out, though. Consider what coef() yields when working with a cross-classified model.

coef(b12.8)
## $actor
## , , Intercept
## 
##     Estimate Est.Error        Q2.5       Q97.5
## 1 -0.3271100 0.2663264 -0.86187692  0.18911844
## 2  4.8720689 1.5678916  2.85115892  8.83635023
## 3 -0.6214518 0.2735622 -1.17196640 -0.09527775
## 4 -0.6198643 0.2724069 -1.17333082 -0.09702507
## 5 -0.3292036 0.2648897 -0.85112550  0.19236212
## 6  0.5825941 0.2714240  0.05727387  1.11666000
## 7  2.0838278 0.3864791  1.37302452  2.88696225
## 
## 
## $block
## , , Intercept
## 
##    Estimate Est.Error       Q2.5    Q97.5
## 1 0.6090302 0.9149552 -1.1640046 2.563989
## 2 0.8482139 0.9064563 -0.9025111 2.759985
## 3 0.8458464 0.9077131 -0.9156767 2.776990
## 4 0.7760209 0.9052327 -0.9835192 2.693233
## 5 0.7748777 0.9068570 -0.9878018 2.710738
## 6 0.8947740 0.9106181 -0.8647786 2.850250

Now we have a list of two elements, one for actor and one for block. What might not be immediately obvious is that the summaries returned by one grouping level are based off of averaging over the other. Although this made our second task easy, it provides a challenge for our third task, getting the posterior draws for the actor-level estimates from the cross-classified b12.8 model, based on block == 1. To accomplish that, we’ll need to bring in ranef(). Let’s review what that returns.

ranef(b12.8)
## $actor
## , , Intercept
## 
##     Estimate Est.Error       Q2.5     Q97.5
## 1 -1.1170149 0.9190223 -3.0750487 0.6496881
## 2  4.0821641 1.6139466  1.7485079 8.0503052
## 3 -1.4113566 0.9212529 -3.3718865 0.3678686
## 4 -1.4097692 0.9209196 -3.3868397 0.3320319
## 5 -1.1191084 0.9199944 -3.0744108 0.6552198
## 6 -0.2073108 0.9211197 -2.1487763 1.5757838
## 7  1.2939230 0.9431009 -0.6501063 3.1687555
## 
## 
## $block
## , , Intercept
## 
##      Estimate Est.Error       Q2.5     Q97.5
## 1 -0.18087466 0.2246993 -0.7335318 0.1166049
## 2  0.05830901 0.1862766 -0.2811078 0.4946732
## 3  0.05594155 0.1835187 -0.2836957 0.4909388
## 4 -0.01388395 0.1788962 -0.4065447 0.3595021
## 5 -0.01502716 0.1821006 -0.4173334 0.3704001
## 6  0.10486915 0.2003494 -0.2180858 0.5894042

The format of the ranef() output is identical to that from coef(). However, the summaries are in the deviance metric. They’re all centered around zero, which corresponds to the part of the statistical model that specifies how \(\alpha_\text{block} \sim \operatorname{Normal}(0, \sigma_\text{block})\). So then, if we want to continue using our coef() method, we’ll need to augment it with ranef() to accomplish our last task.

c3 <-
  coef(b12.8, summary = F)$actor[, , ] %>%
  as_tibble() %>% 
  gather() %>% 
  # here we add in the `block == 1` deviations from the grand mean
  mutate(value = value + ranef(b12.8, summary = F)$block[, 1, ] %>% rep(., times = 7)) %>% 
  mutate(key   = str_c("chimp ", key, "'s average probability of pulling left"),
         value = inv_logit_scaled(value),
         iter  = rep(1:16000, times = 7)) %>% 
  spread(key = key, value = value)

str(c3)
## tibble [16,000 × 8] (S3: tbl_df/tbl/data.frame)
##  $ iter                                         : int [1:16000] 1 2 3 4 5 6 7 8 9 10 ...
##  $ chimp 1's average probability of pulling left: num [1:16000] 0.508 0.303 0.175 0.253 0.292 ...
##  $ chimp 2's average probability of pulling left: num [1:16000] 0.998 0.937 0.962 0.947 0.992 ...
##  $ chimp 3's average probability of pulling left: num [1:16000] 0.349 0.31 0.226 0.277 0.32 ...
##  $ chimp 4's average probability of pulling left: num [1:16000] 0.337 0.311 0.174 0.285 0.307 ...
##  $ chimp 5's average probability of pulling left: num [1:16000] 0.383 0.403 0.229 0.3 0.39 ...
##  $ chimp 6's average probability of pulling left: num [1:16000] 0.668 0.6 0.408 0.478 0.523 ...
##  $ chimp 7's average probability of pulling left: num [1:16000] 0.928 0.777 0.757 0.787 0.731 ...

One of the nicest things about the coef() method is how is scales well. This code is no more burdensome for 5 group levels than it is for 5,000. It’s also a post-processing version of the distinction McElreath made on page 372 between the two equivalent ways you might define a Gaussian:

\[\operatorname{Normal}(10, 1)\] and

\[10 + \operatorname{Normal}(0, 1).\]

Conversely, it can be a little abstract. Let’s keep expanding our options.

12.5.4 brms::fitted().

As is often the case, we’re going to want to define our predictor values for fitted().

(nd <- b12.7$data %>% distinct(actor))
##   actor
## 1     1
## 2     2
## 3     3
## 4     4
## 5     5
## 6     6
## 7     7

Now we have our new data, nd, here’s how we might use fitted() to accomplish our first task, getting the posterior draws for the actor-level estimates from the b12.7 model.

f1 <-
  fitted(b12.7,
         newdata = nd,
         summary = F,
         # within `fitted()`, this line does the same work that
         # `inv_logit_scaled()` did with the other two methods
         scale = "response") %>% 
  as_tibble() %>% 
  set_names(str_c("chimp ", 1:7, "'s average probability of pulling left"))

str(f1)
## tibble [16,000 × 7] (S3: tbl_df/tbl/data.frame)
##  $ chimp 1's average probability of pulling left: num [1:16000] 0.396 0.467 0.489 0.376 0.407 ...
##  $ chimp 2's average probability of pulling left: num [1:16000] 0.995 0.995 0.99 0.983 0.983 ...
##  $ chimp 3's average probability of pulling left: num [1:16000] 0.437 0.293 0.386 0.386 0.425 ...
##  $ chimp 4's average probability of pulling left: num [1:16000] 0.429 0.309 0.353 0.289 0.327 ...
##  $ chimp 5's average probability of pulling left: num [1:16000] 0.51 0.387 0.422 0.517 0.453 ...
##  $ chimp 6's average probability of pulling left: num [1:16000] 0.695 0.54 0.626 0.645 0.664 ...
##  $ chimp 7's average probability of pulling left: num [1:16000] 0.908 0.902 0.859 0.865 0.915 ...

This scales reasonably well. But might not work well if the vectors you wanted to rename didn’t follow a serial order, like ours. If you’re willing to pay with a few more lines of wrangling code, this method is more general, but still scalable.

f1 <-
  fitted(b12.7,
         newdata = nd,
         summary = F,
         scale = "response") %>% 
  as_tibble() %>% 
  # you'll need this line to make the `spread()` line work properly
  mutate(iter = 1:n()) %>% 
  gather(key, value, -iter) %>% 
  mutate(key = str_replace(key, "V", "chimp ")) %>% 
  mutate(key = str_c(key, "'s average probability of pulling left")) %>% 
  spread(key = key, value = value)

str(f1)
## tibble [16,000 × 8] (S3: tbl_df/tbl/data.frame)
##  $ iter                                         : int [1:16000] 1 2 3 4 5 6 7 8 9 10 ...
##  $ chimp 1's average probability of pulling left: num [1:16000] 0.396 0.467 0.489 0.376 0.407 ...
##  $ chimp 2's average probability of pulling left: num [1:16000] 0.995 0.995 0.99 0.983 0.983 ...
##  $ chimp 3's average probability of pulling left: num [1:16000] 0.437 0.293 0.386 0.386 0.425 ...
##  $ chimp 4's average probability of pulling left: num [1:16000] 0.429 0.309 0.353 0.289 0.327 ...
##  $ chimp 5's average probability of pulling left: num [1:16000] 0.51 0.387 0.422 0.517 0.453 ...
##  $ chimp 6's average probability of pulling left: num [1:16000] 0.695 0.54 0.626 0.645 0.664 ...
##  $ chimp 7's average probability of pulling left: num [1:16000] 0.908 0.902 0.859 0.865 0.915 ...

Now unlike with the previous two methods, our fitted() method will not allow us to simply switch out b12.7 for b12.8 to accomplish our second task of getting the posterior draws for the actor-level estimates from the cross-classified b12.8 model, averaging over the levels of block. This is because when we use fitted() in combination with its newdata argument, the function expects us to define values for all the predictor variables in the formula. Because the b12.8 model has both actor and block grouping variables as predictors, the default requires we include both in our new data. But if we were to specify a value for block in the nd data, we would no longer be averaging over the levels of block anymore; we’d be selecting one of the levels of block in particular, which we don’t yet want to do. Happily, brms::fitted() has a re_formula argument. If we would like to average out block, we simply drop it from the formula. Here’s how to do so.

f2 <-
  fitted(b12.8,
         newdata = nd,
         # this line allows us to average over the levels of `block`
         re_formula = pulled_left ~ 1 + (1 | actor),
         summary = F,
         scale = "response") %>% 
  as_tibble() %>% 
  set_names(str_c("chimp ", 1:7, "'s average probability of pulling left"))

str(f2)
## tibble [16,000 × 7] (S3: tbl_df/tbl/data.frame)
##  $ chimp 1's average probability of pulling left: num [1:16000] 0.55 0.407 0.314 0.39 0.334 ...
##  $ chimp 2's average probability of pulling left: num [1:16000] 0.999 0.959 0.982 0.971 0.993 ...
##  $ chimp 3's average probability of pulling left: num [1:16000] 0.389 0.416 0.387 0.42 0.363 ...
##  $ chimp 4's average probability of pulling left: num [1:16000] 0.376 0.417 0.314 0.429 0.35 ...
##  $ chimp 5's average probability of pulling left: num [1:16000] 0.424 0.516 0.391 0.447 0.437 ...
##  $ chimp 6's average probability of pulling left: num [1:16000] 0.704 0.703 0.599 0.633 0.571 ...
##  $ chimp 7's average probability of pulling left: num [1:16000] 0.938 0.846 0.871 0.874 0.768 ...

If we want to use fitted() for our third task of getting the posterior draws for the actor-level estimates from the cross-classified b12.8 model, based on block == 1, we’ll need to augment our nd data.

(
  nd <-
  b12.8$data %>% 
  distinct(actor) %>% 
  mutate(block = 1)
)
##   actor block
## 1     1     1
## 2     2     1
## 3     3     1
## 4     4     1
## 5     5     1
## 6     6     1
## 7     7     1

This time, we no longer need that re_formula argument.

f3 <-
  fitted(b12.8,
         newdata = nd,
         summary = F,
         scale = "response") %>% 
  as_tibble() %>% 
  set_names(str_c("chimp ", 1:7, "'s average probability of pulling left"))

str(f3)
## tibble [16,000 × 7] (S3: tbl_df/tbl/data.frame)
##  $ chimp 1's average probability of pulling left: num [1:16000] 0.508 0.303 0.175 0.253 0.292 ...
##  $ chimp 2's average probability of pulling left: num [1:16000] 0.998 0.937 0.962 0.947 0.992 ...
##  $ chimp 3's average probability of pulling left: num [1:16000] 0.349 0.31 0.226 0.277 0.32 ...
##  $ chimp 4's average probability of pulling left: num [1:16000] 0.337 0.311 0.174 0.285 0.307 ...
##  $ chimp 5's average probability of pulling left: num [1:16000] 0.383 0.403 0.229 0.3 0.39 ...
##  $ chimp 6's average probability of pulling left: num [1:16000] 0.668 0.6 0.408 0.478 0.523 ...
##  $ chimp 7's average probability of pulling left: num [1:16000] 0.928 0.777 0.757 0.787 0.731 ...

Let’s learn one more option.

12.5.5 tidybayes::spread_draws().

Up till this point, we’ve really only used the tidybayes package for plotting (e.g., with stat_halfeye()) and summarizing (e.g., with median_qi()). But tidybayes is more general; it offers a handful of convenience functions for wrangling posterior draws from a tidyverse perspective. One such function is spread_draws(), which you can learn all about in Matthew Kay’s (2020c) vignette, Extracting and visualizing tidy draws from brms models. Let’s take a look at how we’ll be using it.

b12.7 %>%
  spread_draws(b_Intercept, r_actor[actor,])
## # A tibble: 112,000 x 6
## # Groups:   actor [7]
##    .chain .iteration .draw b_Intercept actor r_actor
##     <int>      <int> <int>       <dbl> <int>   <dbl>
##  1      1          1     1        1.30     1  -1.72 
##  2      1          1     1        1.30     2   4.08 
##  3      1          1     1        1.30     3  -1.55 
##  4      1          1     1        1.30     4  -1.59 
##  5      1          1     1        1.30     5  -1.26 
##  6      1          1     1        1.30     6  -0.478
##  7      1          1     1        1.30     7   0.984
##  8      1          2     2        1.18     1  -1.31 
##  9      1          2     2        1.18     2   4.21 
## 10      1          2     2        1.18     3  -2.06 
## # … with 111,990 more rows

First, notice tidybayes::spread_draws() took the model fit itself, b12.7, as input. No need for posterior_samples(). Now, notice we fed it two additional arguments. By the first argument, we that requested spead_draws() extract the posterior samples for the b_Intercept. By the second argument, r_actor[actor,], we instructed spead_draws() to extract all the random effects for the actor variable. Also notice how within the brackets [] we specified actor, which then became the name of the column in the output that indexed the levels of the grouping variable actor. By default, the code returns the posterior samples for all the levels of actor. However, had we only wanted those from chimps #1 and #3, we might use typical tidyverse-style indexing. E.g.,

b12.7 %>%
  spread_draws(b_Intercept, r_actor[actor,]) %>% 
  filter(actor %in% c(1, 3))
## # A tibble: 32,000 x 6
## # Groups:   actor [2]
##    .chain .iteration .draw b_Intercept actor r_actor
##     <int>      <int> <int>       <dbl> <int>   <dbl>
##  1      1          1     1        1.30     1   -1.72
##  2      1          1     1        1.30     3   -1.55
##  3      1          2     2        1.18     1   -1.31
##  4      1          2     2        1.18     3   -2.06
##  5      1          3     3        1.32     1   -1.36
##  6      1          3     3        1.32     3   -1.78
##  7      1          4     4        1.19     1   -1.70
##  8      1          4     4        1.19     3   -1.65
##  9      1          5     5        1.20     1   -1.58
## 10      1          5     5        1.20     3   -1.50
## # … with 31,990 more rows

Also notice those first three columns. By default, spread_draws() extracted information about which Markov chain a given draw was from, which iteration a given draw was within a given chain, and which draw from an overall standpoint. If it helps to keep track of which vector indexed what, consider this.

b12.7 %>%
  spread_draws(b_Intercept, r_actor[actor,]) %>% 
  ungroup() %>% 
  select(.chain:.draw) %>% 
  gather() %>% 
  group_by(key) %>%
  summarise(min = min(value),
            max = max(value))
## # A tibble: 3 x 3
##   key          min   max
##   <chr>      <int> <int>
## 1 .chain         1     4
## 2 .draw          1 16000
## 3 .iteration     1  4000

Above we simply summarized each of the three variables by their minimum and maximum values. If you recall that we fit b12.7 with four Markov chains, each with 4,000 post-warmup iterations, hopefully it’ll make sense what each of those three variables index.

Now we’ve done a little clarification, let’s use spread_draws() to accomplish our first task, getting the posterior draws for the actor-level estimates from the b12.7 model.

s1 <-
  b12.7 %>%
  spread_draws(b_Intercept, r_actor[actor,]) %>% 
  mutate(p = (b_Intercept + r_actor) %>% inv_logit_scaled()) %>% 
  select(.draw, actor, p) %>% 
  ungroup() %>% 
  mutate(actor = str_c("chimp ", actor, "'s average probability of pulling left")) %>% 
  spread(value = p, key = actor)

str(s1)
## tibble [16,000 × 8] (S3: tbl_df/tbl/data.frame)
##  $ .draw                                        : int [1:16000] 1 2 3 4 5 6 7 8 9 10 ...
##  $ chimp 1's average probability of pulling left: num [1:16000] 0.396 0.467 0.489 0.376 0.407 ...
##  $ chimp 2's average probability of pulling left: num [1:16000] 0.995 0.995 0.99 0.983 0.983 ...
##  $ chimp 3's average probability of pulling left: num [1:16000] 0.437 0.293 0.386 0.386 0.425 ...
##  $ chimp 4's average probability of pulling left: num [1:16000] 0.429 0.309 0.353 0.289 0.327 ...
##  $ chimp 5's average probability of pulling left: num [1:16000] 0.51 0.387 0.422 0.517 0.453 ...
##  $ chimp 6's average probability of pulling left: num [1:16000] 0.695 0.54 0.626 0.645 0.664 ...
##  $ chimp 7's average probability of pulling left: num [1:16000] 0.908 0.902 0.859 0.865 0.915 ...

The method remains essentially the same for accomplishing our second task, getting the posterior draws for the actor-level estimates from the cross-classified b12.8 model, averaging over the levels of block.

s2 <-
  b12.8 %>%
  spread_draws(b_Intercept, r_actor[actor,]) %>% 
  mutate(p = (b_Intercept + r_actor) %>% inv_logit_scaled()) %>% 
  select(.draw, actor, p) %>% 
  ungroup() %>% 
  mutate(actor = str_c("chimp ", actor, "'s average probability of pulling left")) %>% 
  spread(value = p, key = actor)

str(s2)
## tibble [16,000 × 8] (S3: tbl_df/tbl/data.frame)
##  $ .draw                                        : int [1:16000] 1 2 3 4 5 6 7 8 9 10 ...
##  $ chimp 1's average probability of pulling left: num [1:16000] 0.55 0.407 0.314 0.39 0.334 ...
##  $ chimp 2's average probability of pulling left: num [1:16000] 0.999 0.959 0.982 0.971 0.993 ...
##  $ chimp 3's average probability of pulling left: num [1:16000] 0.389 0.416 0.387 0.42 0.363 ...
##  $ chimp 4's average probability of pulling left: num [1:16000] 0.376 0.417 0.314 0.429 0.35 ...
##  $ chimp 5's average probability of pulling left: num [1:16000] 0.424 0.516 0.391 0.447 0.437 ...
##  $ chimp 6's average probability of pulling left: num [1:16000] 0.704 0.703 0.599 0.633 0.571 ...
##  $ chimp 7's average probability of pulling left: num [1:16000] 0.938 0.846 0.871 0.874 0.768 ...

To accomplish our third task, we augment the spread_draws() and first mutate() lines, and add a filter() line between them.

s3 <-
  b12.8 %>%
  spread_draws(b_Intercept, r_actor[actor,], r_block[block,]) %>% 
  filter(block == 1) %>% 
  mutate(p = (b_Intercept + r_actor + r_block) %>% inv_logit_scaled()) %>% 
  select(.draw, actor, p) %>% 
  ungroup() %>% 
  mutate(actor = str_c("chimp ", actor, "'s average probability of pulling left")) %>% 
  spread(value = p, key = actor)

str(s3)
## tibble [16,000 × 9] (S3: tbl_df/tbl/data.frame)
##  $ block                                        : int [1:16000] 1 1 1 1 1 1 1 1 1 1 ...
##  $ .draw                                        : int [1:16000] 1 2 3 4 5 6 7 8 9 10 ...
##  $ chimp 1's average probability of pulling left: num [1:16000] 0.508 0.303 0.175 0.253 0.292 ...
##  $ chimp 2's average probability of pulling left: num [1:16000] 0.998 0.937 0.962 0.947 0.992 ...
##  $ chimp 3's average probability of pulling left: num [1:16000] 0.349 0.31 0.226 0.277 0.32 ...
##  $ chimp 4's average probability of pulling left: num [1:16000] 0.337 0.311 0.174 0.285 0.307 ...
##  $ chimp 5's average probability of pulling left: num [1:16000] 0.383 0.403 0.229 0.3 0.39 ...
##  $ chimp 6's average probability of pulling left: num [1:16000] 0.668 0.6 0.408 0.478 0.523 ...
##  $ chimp 7's average probability of pulling left: num [1:16000] 0.928 0.777 0.757 0.787 0.731 ...

Hopefully working through these examples gave you some insight on the relation between fixed and random effects within multilevel models, and perhaps added to your posterior-iteration-wrangling toolkit.

Session info

sessionInfo()
## R version 3.6.3 (2020-02-29)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS Catalina 10.15.3
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] posterior_0.1.0      bayesplot_1.7.1      tidybayes_2.1.1      patchwork_1.0.1.9000
##  [5] ggthemes_4.2.0       forcats_0.5.0        stringr_1.4.0        dplyr_1.0.1         
##  [9] purrr_0.3.4          readr_1.3.1          tidyr_1.1.1          tibble_3.0.3        
## [13] tidyverse_1.3.0      brms_2.13.5          Rcpp_1.0.5           dagitty_0.2-2       
## [17] rstan_2.19.3         ggplot2_3.3.2        StanHeaders_2.21.0-1
## 
## loaded via a namespace (and not attached):
##   [1] readxl_1.3.1         backports_1.1.9      plyr_1.8.6           igraph_1.2.5        
##   [5] splines_3.6.3        svUnit_1.0.3         crosstalk_1.1.0.1    TH.data_1.0-10      
##   [9] rstantools_2.1.1     inline_0.3.15        digest_0.6.25        htmltools_0.5.0     
##  [13] rsconnect_0.8.16     fansi_0.4.1          magrittr_1.5         checkmate_2.0.0     
##  [17] modelr_0.1.6         matrixStats_0.56.0   xts_0.12-0           sandwich_2.5-1      
##  [21] prettyunits_1.1.1    colorspace_1.4-1     rvest_0.3.5          ggdist_2.1.1        
##  [25] haven_2.2.0          xfun_0.13            callr_3.4.4          crayon_1.3.4        
##  [29] jsonlite_1.7.0       survival_3.1-12      zoo_1.8-7            glue_1.4.2          
##  [33] gtable_0.3.0         emmeans_1.4.5        V8_3.0.2             pkgbuild_1.1.0      
##  [37] shape_1.4.4          abind_1.4-5          scales_1.1.1         mvtnorm_1.1-0       
##  [41] DBI_1.1.0            miniUI_0.1.1.1       xtable_1.8-4         stats4_3.6.3        
##  [45] DT_0.13              htmlwidgets_1.5.1    httr_1.4.1           threejs_0.3.3       
##  [49] arrayhelpers_1.1-0   ellipsis_0.3.1       pkgconfig_2.0.3      loo_2.3.1           
##  [53] farver_2.0.3         dbplyr_1.4.2         utf8_1.1.4           tidyselect_1.1.0    
##  [57] labeling_0.3         rlang_0.4.7          reshape2_1.4.4       later_1.1.0.1       
##  [61] munsell_0.5.0        cellranger_1.1.0     tools_3.6.3          cli_2.0.2           
##  [65] generics_0.0.2       broom_0.5.5          ggridges_0.5.2       evaluate_0.14       
##  [69] fastmap_1.0.1        yaml_2.2.1           processx_3.4.4       knitr_1.28          
##  [73] fs_1.4.1             nlme_3.1-144         mime_0.9             xml2_1.3.1          
##  [77] compiler_3.6.3       shinythemes_1.1.2    rstudioapi_0.11      curl_4.3            
##  [81] reprex_0.3.0         stringi_1.4.6        highr_0.8            ps_1.3.4            
##  [85] Brobdingnag_1.2-6    lattice_0.20-38      Matrix_1.2-18        markdown_1.1        
##  [89] shinyjs_1.1          vctrs_0.3.4          pillar_1.4.6         lifecycle_0.2.0     
##  [93] bridgesampling_1.0-0 estimability_1.3     httpuv_1.5.4         R6_2.4.1            
##  [97] bookdown_0.18        promises_1.1.1       gridExtra_2.3        codetools_0.2-16    
## [101] boot_1.3-24          colourpicker_1.0     MASS_7.3-51.5        gtools_3.8.2        
## [105] assertthat_0.2.1     withr_2.2.0          shinystan_2.5.0      multcomp_1.4-13     
## [109] hms_0.5.3            grid_3.6.3           coda_0.19-3          rmarkdown_2.1       
## [113] shiny_1.5.0          lubridate_1.7.8      base64enc_0.1-3      dygraphs_1.1.1.6

References

Barr, D. J., Levy, R., Scheepers, C., & Tily, H. J. (2013). Random effects structure for confirmatory hypothesis testing: Keep it maximal. Journal of Memory and Language, 68(3), 255–278. https://doi.org/10.1016/j.jml.2012.11.001

Bürkner, P.-C., Gabry, J., Kay, M., & Vehtari, A. (2020). posterior: Tools for working with posterior distributions. https://mc-stan.org/posterior

Efron, B., & Morris, C. (1977). Stein’s paradox in statistics. Scientific American, 236(5), 119–127. https://doi.org/10.1038/scientificamerican0577-119

Kay, M. (2020c). Extracting and visualizing tidy draws from brms models. https://mjskay.github.io/tidybayes/articles/tidy-brms.html

McElreath, R. (2015). Statistical rethinking: A Bayesian course with examples in R and Stan. CRC press. https://xcelab.net/rm/statistical-rethinking/

Vonesh, J. R., & Bolker, B. M. (2005). Compensatory larval responses shift trade-offs associated with predator-induced hatching plasticity. Ecology, 86(6), 1580–1591. https://doi.org/10.1890/04-0535