8.4 The equivalence between moderated regression analysis and a 2 X 2 factorial analysis of variance

I’m just not going to encourage ANOVA \(F\)-testing methodology. However, I will show the Bayesian regression model. First, here are the data.

caskets <- read_csv("data/caskets/caskets.csv")

glimpse(caskets)
## Observations: 541
## Variables: 7
## $ policy   <int> 1, 0, 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 1, 1, 0, 1, 0, 1, 0, 1, ...
## $ interest <dbl> 4.0, 2.0, 3.0, 1.0, 1.0, 2.0, 1.0, 2.5, 3.0, 1.0, 2.0, 3.5, 1.0, 1.0, 1.5, 3.0...
## $ age      <int> 39, 57, 63, 56, 50, 87, 33, 64, 82, 28, 18, 52, 42, 39, 64, 72, 54, 84, 55, 27...
## $ educ     <int> 3, 3, 2, 5, 3, 2, 7, 2, 3, 3, 1, 1, 5, 4, 3, 2, 3, 4, 7, 2, 3, 5, 4, 5, 5, 3, ...
## $ male     <int> 1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 1, 1, 1, 1, ...
## $ conserv  <int> 4, 3, 6, 3, 3, 5, 6, 3, 6, 7, 4, 2, 7, 6, 5, 6, 6, 3, 7, 6, 5, 5, 3, 4, 6, 2, ...
## $ kerry    <int> 1, 1, 1, 1, 1, 0, 0, 1, 0, 0, 0, 1, 1, 0, 1, 1, 1, 1, 0, 0, 0, 0, 1, 0, 0, 0, ...

The model:

model5 <-
  brm(data = caskets, family = gaussian,
      interest ~ 1 + policy + kerry + policy:kerry,
      chains = 4, cores = 4)
print(model5)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: interest ~ 1 + policy + kerry + policy:kerry 
##    Data: caskets (Number of observations: 541) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##              Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## Intercept        1.78      0.09     1.61     1.96       2823 1.00
## policy          -0.38      0.13    -0.64    -0.13       2520 1.00
## kerry            0.60      0.13     0.35     0.86       2626 1.00
## policy:kerry     0.36      0.18     0.02     0.71       2264 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     1.04      0.03     0.98     1.10       4000 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Those results don’t look anything like what Hayes reported in Tables 8.3 or 8.4. However, a little deft manipulation of the posterior samples can yield equivalent results to Hayes’s Table 8.3.

post <- 
  posterior_samples(model5) %>% 
  mutate(Y_bar_1 = b_Intercept + b_policy*0 + b_kerry*0 + `b_policy:kerry`*0*0,
         Y_bar_3 = b_Intercept + b_policy*0 + b_kerry*1 + `b_policy:kerry`*0*1,
         Y_bar_2 = b_Intercept + b_policy*1 + b_kerry*0 + `b_policy:kerry`*1*0,
         Y_bar_4 = b_Intercept + b_policy*1 + b_kerry*1 + `b_policy:kerry`*1*1,
         
         Y_bar_12 = b_Intercept + b_policy*.5 + b_kerry*0  + `b_policy:kerry`*.5*0,
         Y_bar_34 = b_Intercept + b_policy*.5 + b_kerry*1  + `b_policy:kerry`*.5*1,
         Y_bar_13 = b_Intercept + b_policy*0  + b_kerry*.5 + `b_policy:kerry`*0*.5,
         Y_bar_24 = b_Intercept + b_policy*1  + b_kerry*.5 + `b_policy:kerry`*1*.5)

Here are the cell-specific means in Table 8.3.

post %>% 
  select(Y_bar_1:Y_bar_4) %>% 
  gather() %>% 
  group_by(key) %>% 
  summarize(median = median(value),
            ll = quantile(value, probs = .025),
            ul = quantile(value, probs = .975)) %>% 
  mutate_if(is.double, round, digits = 3)
## # A tibble: 4 x 4
##   key     median    ll    ul
##   <chr>    <dbl> <dbl> <dbl>
## 1 Y_bar_1   1.78  1.61  1.96
## 2 Y_bar_2   1.40  1.22  1.58
## 3 Y_bar_3   2.38  2.20  2.56
## 4 Y_bar_4   2.36  2.19  2.53

And here are the marginal means from Table 8.3.

post %>% 
  select(Y_bar_12:Y_bar_24) %>% 
  gather() %>% 
  group_by(key) %>% 
  summarize(median = median(value),
            ll = quantile(value, probs = .025),
            ul = quantile(value, probs = .975)) %>% 
  mutate_if(is.double, round, digits = 3)
## # A tibble: 4 x 4
##   key      median    ll    ul
##   <chr>     <dbl> <dbl> <dbl>
## 1 Y_bar_12   1.59  1.47  1.72
## 2 Y_bar_13   2.08  1.96  2.21
## 3 Y_bar_24   1.88  1.76  2.00
## 4 Y_bar_34   2.37  2.25  2.49

For kicks and giggles, here are what the cell-specific means look like in box plots.

post %>% 
  select(Y_bar_1:Y_bar_4) %>% 
  gather() %>% 
  
  ggplot(aes(x = key, y = value, fill = key)) +
  geom_boxplot(size = 1/3) +
  scale_fill_manual(values = ochre_palettes[["olsen_qual"]][c(5, 6, 4, 3)]) +
  labs(title = "Cell-specific effects",
       x = NULL, 
       y = "interest") +
  theme_08 +
  theme(legend.position = "none")

And here are the same for the marginal means. This time we’ll show the shapes of the posteriors with violin plots with horizontal lines depicting the median and interquartile ranges.

post %>% 
  select(Y_bar_12:Y_bar_24) %>% 
  gather() %>% 
  
  ggplot(aes(x = key, y = value, fill = key)) +
  geom_violin(draw_quantiles = c(.25, .5, .75),
              color = ochre_palettes[["olsen_seq"]][8]) +
  scale_fill_manual(values = ochre_palettes[["olsen_qual"]][c(5, 6, 4, 3)]) +
  labs(title = "Marginal means",
       x = NULL, 
       y = "interest") +
  theme_08 +
  theme(legend.position = "none")

On page 294, Hayes used point estimates to compute the simple effect of policy information among Kerry supporters and then the same thing among Bush supporters. Here’s how we’d do that when working with the full vector of posterior iterations:

post %>% 
  transmute(simple_effect_Kerry = Y_bar_4 - Y_bar_3,
            simple_effect_Bush = Y_bar_2 - Y_bar_1) %>% 
  gather() %>% 
  group_by(key) %>% 
  summarize(median = median(value),
            ll = quantile(value, probs = .025),
            ul = quantile(value, probs = .975)) %>% 
  mutate_if(is.double, round, digits = 3)
## # A tibble: 2 x 4
##   key                 median     ll     ul
##   <chr>                <dbl>  <dbl>  <dbl>
## 1 simple_effect_Bush  -0.385 -0.642 -0.134
## 2 simple_effect_Kerry -0.028 -0.271  0.229

So then computing the main effect for policy information using the simple effects is little more than an extension of those steps.

post %>% 
  transmute(main_effect = ((Y_bar_4 - Y_bar_3) + (Y_bar_2 - Y_bar_1))/2) %>% 
  summarize(median = median(main_effect),
            ll = quantile(main_effect, probs = .025),
            ul = quantile(main_effect, probs = .975)) %>% 
  mutate_if(is.double, round, digits = 3)
##   median     ll     ul
## 1 -0.206 -0.379 -0.026

And we get the same results by strategically subtracting the marginal means.

post %>% 
  transmute(main_effect = Y_bar_24 - Y_bar_13) %>% 
  summarize(median = median(main_effect),
            ll = quantile(main_effect, probs = .025),
            ul = quantile(main_effect, probs = .975)) %>% 
  mutate_if(is.double, round, digits = 3)
##   median     ll     ul
## 1 -0.206 -0.379 -0.026

The main effect of for candidate is similarly computed using either approach:

post %>% 
  transmute(main_effect = ((Y_bar_4 - Y_bar_2) + (Y_bar_3 - Y_bar_1))/2) %>% 
  summarize(median = median(main_effect),
            ll = quantile(main_effect, probs = .025),
            ul = quantile(main_effect, probs = .975)) %>% 
  mutate_if(is.double, round, digits = 3)
##   median    ll    ul
## 1  0.777 0.601 0.953
post %>% 
  transmute(main_effect = Y_bar_34 - Y_bar_12) %>% 
  summarize(median = median(main_effect),
            ll = quantile(main_effect, probs = .025),
            ul = quantile(main_effect, probs = .975)) %>% 
  mutate_if(is.double, round, digits = 3)
##   median    ll    ul
## 1  0.777 0.601 0.953

We don’t have an \(F\)-test for our Bayesian moderation model. But we do have an interaction term. Here’s its distribution:

post %>% 
  ggplot(aes(x = `b_policy:kerry`)) +
  geom_density(size = 0,
               fill = ochre_palettes[["olsen_qual"]][2]) +
  geom_vline(xintercept = fixef(model5)["policy:kerry", c(1, 3, 4)],
             color = ochre_palettes[["olsen_seq"]][8], linetype = c(1, 2, 2)) +
  scale_x_continuous(breaks = fixef(model5)["policy:kerry", c(1, 3, 4)],
                     labels = fixef(model5)["policy:kerry", c(1, 3, 4)] %>% round(digits = 2)) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(title = "The interaction term, `policy:kerry`",
       subtitle = "The solid vertical line is the posterior mean\nand the dashed lines to either end denote the\npercentile- based 95% intervals.",
       x = NULL) +
  theme_08 +
  theme(legend.position = "none")

Following Hayes’s work on the bottom of page 295, here’s how you’d reproduce that by manipulating our \(\overline{Y}\) vectors.

post %>% 
  transmute(reproduced_interaction_term = (Y_bar_4 - Y_bar_3) - (Y_bar_2 - Y_bar_1)) %>% 
  summarize(median = median(reproduced_interaction_term),
            ll = quantile(reproduced_interaction_term, probs = .025),
            ul = quantile(reproduced_interaction_term, probs = .975)) %>% 
  mutate_if(is.double, round, digits = 2)
##   median   ll   ul
## 1   0.36 0.02 0.71

Extending that logic, we also get:

post %>% 
  transmute(reproduced_interaction_term = (Y_bar_4 - Y_bar_2) - (Y_bar_3 - Y_bar_1)) %>% 
  summarize(median = median(reproduced_interaction_term),
            ll = quantile(reproduced_interaction_term, probs = .025),
            ul = quantile(reproduced_interaction_term, probs = .975)) %>% 
  mutate_if(is.double, round, digits = 2)
##   median   ll   ul
## 1   0.36 0.02 0.71

8.4.1 Simple effects parameterization.

We might reacquaint ourselves with the formula from model5.

model5$formula
## interest ~ 1 + policy + kerry + policy:kerry

The results cohere nicely with the “Model 1” results at the top of Table 8.5.

fixef(model5) %>% round(digits = 3)
##              Estimate Est.Error   Q2.5  Q97.5
## Intercept       1.784     0.090  1.607  1.963
## policy         -0.383     0.126 -0.642 -0.134
## kerry           0.600     0.130  0.346  0.857
## policy:kerry    0.357     0.180  0.020  0.710

The Bayesian \(R^2\) portion looks on point, too.

bayes_R2(model5) %>% round(digits = 3)
##    Estimate Est.Error  Q2.5 Q97.5
## R2     0.14     0.025 0.091  0.19

Our various Y_bar transformations from before continue to cohere with the coefficients, above, just like in the text. E.g., the policy coefficient may be returned like so:

post %>% 
  transmute(b1 = b_policy,
            `Y_bar_2 - Y_bar_1` = Y_bar_2 - Y_bar_1) %>% 
  gather() %>% 
  group_by(key) %>% 
  summarize(mean = mean(value),
            sd = sd(value)) %>% 
  mutate_if(is.double, round, digits = 3)
## # A tibble: 2 x 3
##   key                 mean    sd
##   <chr>              <dbl> <dbl>
## 1 b1                -0.383 0.126
## 2 Y_bar_2 - Y_bar_1 -0.383 0.126

We can continue to use Hayes’s Y_bar transformations to return the kerry coefficient, too.

post %>% 
  transmute(b2 = b_kerry,
            `Y_bar_3 - Y_bar_1` = Y_bar_3 - Y_bar_1) %>% 
  gather() %>% 
  group_by(key) %>% 
  summarize(mean = mean(value),
            sd = sd(value)) %>% 
  mutate_if(is.double, round, digits = 3)
## # A tibble: 2 x 3
##   key                mean    sd
##   <chr>             <dbl> <dbl>
## 1 b2                  0.6  0.13
## 2 Y_bar_3 - Y_bar_1   0.6  0.13

Here we compute \(b_{3}\) with the difference between the simple effects of \(X\) at levels of \(W\).

post %>% 
  transmute(b3 = `b_policy:kerry`,
            `(Y_bar_4 - Y_bar_3) - (Y_bar_2 - Y_bar_1)` = (Y_bar_4 - Y_bar_3) - (Y_bar_2 - Y_bar_1)) %>% 
  gather() %>% 
  group_by(key) %>% 
  summarize(mean = mean(value),
            sd = sd(value)) %>% 
  mutate_if(is.double, round, digits = 3)
## # A tibble: 2 x 3
##   key                                        mean    sd
##   <chr>                                     <dbl> <dbl>
## 1 (Y_bar_4 - Y_bar_3) - (Y_bar_2 - Y_bar_1) 0.357  0.18
## 2 b3                                        0.357  0.18

And now \(b_{3}\) with the difference between the simple effects of \(W\) at levels of \(X\).

post %>% 
  transmute(b3 = `b_policy:kerry`,
            `(Y_bar_4 - Y_bar_2) - (Y_bar_3 - Y_bar_1)` = (Y_bar_4 - Y_bar_2) - (Y_bar_3 - Y_bar_1)) %>% 
  gather() %>% 
  group_by(key) %>% 
  summarize(mean = mean(value),
            sd = sd(value)) %>% 
  mutate_if(is.double, round, digits = 3)
## # A tibble: 2 x 3
##   key                                        mean    sd
##   <chr>                                     <dbl> <dbl>
## 1 (Y_bar_4 - Y_bar_2) - (Y_bar_3 - Y_bar_1) 0.357  0.18
## 2 b3                                        0.357  0.18

8.4.2 Main effects parameterization.

A nice feature of brms is you can transform your data right within the brm() or update() functions. Here we’ll make our two new main-effects-coded variables, policy_me and kerry_me, with the mutate() function right within update().

model6 <-
  update(model5,
         newdata = caskets %>%
           mutate(policy_me = policy - .5,
                  kerry_me = kerry - .5), 
         family = gaussian,
         interest ~ 1 + policy_me + kerry_me + policy_me:kerry_me,
         chains = 4, cores = 4)

Transforming your data within the brms functions won’t change the original data structure. However, brms will save the data used to fit the model within the brm() object. You can access that data like so:

model6$data %>%
  head()
##   interest policy_me kerry_me
## 1        4       0.5      0.5
## 2        2      -0.5      0.5
## 3        3       0.5      0.5
## 4        1       0.5      0.5
## 5        1       0.5      0.5
## 6        2      -0.5     -0.5

But we digress. Here’s our analogue to the “Model 2” portion of Table 8.5.

fixef(model6) %>% round(digits = 3)
##                    Estimate Est.Error   Q2.5  Q97.5
## Intercept             1.980     0.045  1.894  2.066
## policy_me            -0.206     0.089 -0.380 -0.030
## kerry_me              0.780     0.089  0.604  0.952
## policy_me:kerry_me    0.361     0.174  0.022  0.701
bayes_R2(model6) %>% round(digits = 3)
##    Estimate Est.Error  Q2.5 Q97.5
## R2     0.14     0.025 0.092  0.19

Like with model6, above, we’ll need a bit of algebra to compute our \(\overline{Y_{i}}\) vectors.

post <- 
  posterior_samples(model6) %>% 
  mutate(Y_bar_1 = b_Intercept + b_policy_me*-.5 + b_kerry_me*-.5 + `b_policy_me:kerry_me`*-.5*-.5,
         Y_bar_3 = b_Intercept + b_policy_me*-.5 + b_kerry_me*.5 + `b_policy_me:kerry_me`*-.5*.5,
         Y_bar_2 = b_Intercept + b_policy_me*.5 + b_kerry_me*-.5 + `b_policy_me:kerry_me`*.5*-.5,
         Y_bar_4 = b_Intercept + b_policy_me*.5 + b_kerry_me*.5 + `b_policy_me:kerry_me`*.5*.5)

With our post for fit5 in hand, we’ll follow the formulas at the top of page 298 to compute our \(b_{1}\) and \(b_{2}\) distributions.

post %>% 
  transmute(b1 = ((Y_bar_4 - Y_bar_3) + (Y_bar_2 - Y_bar_1))/2,
            b2 = ((Y_bar_4 - Y_bar_2) + (Y_bar_3 - Y_bar_1))/2) %>% 
  gather() %>% 
  group_by(key) %>% 
  summarize(mean = mean(value),
            sd = sd(value)) %>% 
  mutate_if(is.double, round, digits = 3)
## # A tibble: 2 x 3
##   key     mean    sd
##   <chr>  <dbl> <dbl>
## 1 b1    -0.206 0.089
## 2 b2     0.78  0.089

Hayes pointed out that the interaction effect, \(b_{3}\), is the same across models his OLS Models 1 and 2. This is largely true for our Bayesian HMC fit4 adn fit5 models:

fixef(model5)[4, ] %>% round(digits = 3)
##  Estimate Est.Error      Q2.5     Q97.5 
##     0.357     0.180     0.020     0.710
fixef(model6)[4, ] %>% round(digits = 3)
##  Estimate Est.Error      Q2.5     Q97.5 
##     0.361     0.174     0.022     0.701

However, the results aren’t exactly the same because of simulation error. If you were working on a project requiring high precision, increase the number of posterior iterations. To demonstrate, here we’ll increase each chain’s post-warmup iteration count by an order of magnitude, resulting in 80,000 post-warmup iterations rather than the defuault 4,000.

model5 <-
  update(model5,
         chains = 4, cores = 4, warmup = 1000, iter = 21000)

model6 <-
  update(model6,
         chains = 4, cores = 4, warmup = 1000, iter = 21000)

Now they’re quite a bit closer.

fixef(model5)[4, ] %>% round(digits = 3)
##  Estimate Est.Error      Q2.5     Q97.5 
##     0.360     0.179     0.007     0.711
fixef(model6)[4, ] %>% round(digits = 3)
##  Estimate Est.Error      Q2.5     Q97.5 
##     0.360     0.178     0.013     0.707

And before you get fixate on how there are still differences after 80,000 iterations, each, consider comparing the two density plots:

posterior_samples(model5) %>% 
  as_tibble() %>% 
  select(`b_policy:kerry`) %>% 
  rename(iteraction = `b_policy:kerry`) %>% 
  bind_rows(
    posterior_samples(model6) %>% 
      as_tibble() %>% 
      select(`b_policy_me:kerry_me`) %>% 
      rename(iteraction = `b_policy_me:kerry_me`)
  ) %>% 
  mutate(model = rep(c("model5", "model6"), each = 80000)) %>% 
  
  ggplot(aes(x = iteraction, fill = model)) +
  geom_density(size = 0, alpha = 1/2) +
  scale_fill_manual(values = ochre_palettes[["olsen_qual"]][c(3, 6)]) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(title = "The interaction densities, by model",
       subtitle = "Yes, they are indeed different. And yet that difference is so\ntrivial that we'd expect greater variability from measurement\nerror than we still have from simulation error.",
       x = NULL) +
  theme_08 

8.4.3 Conducting a 2 X 2 between-participants factorial ANOVA using PROCESS another regression model with brms.

Since we’re square in regression land with brms, there’s no direct analogue for us, here. However, notice the post-ANOVA \(t\)-tests Hayes presented on page 300. If we just want to consider the 2 X 2 structure of our two dummy variables as indicative of four groups, we have one more coding system available for the job. With the handy str_c() function, we’ll concatenate the policy and kerry values into a nominal variable, policy_kerry. Here’s what that looks like:

caskets <-
  caskets %>% 
  mutate(policy_kerry = str_c(policy, kerry))

head(caskets)
## # A tibble: 6 x 8
##   policy interest   age  educ  male conserv kerry policy_kerry
##    <int>    <dbl> <int> <int> <int>   <int> <int> <chr>       
## 1      1        4    39     3     1       4     1 11          
## 2      0        2    57     3     1       3     1 01          
## 3      1        3    63     2     0       6     1 11          
## 4      1        1    56     5     1       3     1 11          
## 5      1        1    50     3     0       3     1 11          
## 6      0        2    87     2     1       5     0 00

Now check out what happens if we reformat our formula to interest ~ 0 + policy_kerry.

model7 <-
  brm(data = caskets, family = gaussian,
      interest ~ 0 + policy_kerry,
      chains = 4, cores = 4)

The brm() function recnognized policy_kerry was a character vector and treated it as a nominal variable. The 0 + part of the function removed the model intercept. Here’s how that effects the output:

print(model7)
##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: interest ~ 0 + policy_kerry 
##    Data: caskets (Number of observations: 541) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##                Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## policy_kerry00     1.78      0.09     1.61     1.95       4000 1.00
## policy_kerry01     2.38      0.09     2.20     2.56       4000 1.00
## policy_kerry10     1.40      0.09     1.22     1.58       4000 1.00
## policy_kerry11     2.36      0.09     2.19     2.53       4000 1.00
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma     1.04      0.03     0.98     1.10       4000 1.00
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Without the typical intercept, brm() estimated the means for each of the four policy_kerry groups. It’s kinda like an intercept-only model, but with four intercepts. Here’s what their densities look like:

post <- posterior_samples(model7)

post %>% 
  select(b_policy_kerry00:b_policy_kerry11) %>% 
  gather() %>%
  mutate(key = str_remove(key, "b_")) %>% 
  
  ggplot(aes(x = value, fill = key)) +
  geom_density(color = "transparent", alpha = 2/3) +
  scale_fill_manual(NULL,
                    values = ochre_palettes[["olsen_qual"]][c(5, 6, 4, 3)]) +
  scale_y_continuous(NULL, breaks = NULL) +
  labs(title = "Group means",
       x = NULL) +
  theme_08

Since each of the four primary vectors in our post object is of a group mean, it’s trivial to compute difference scores. To compute the difference score analogous to Hayes’s two \(t\)-tests, we’d do the following.

post %>% 
  transmute(difference_1 = b_policy_kerry10 - b_policy_kerry00,
            difference_2 = b_policy_kerry11 - b_policy_kerry01) %>% 
  gather() %>% 
  group_by(key) %>% 
  summarize(median = median(value),
            ll = quantile(value, probs = .025),
            ul = quantile(value, probs = .975)) %>% 
  mutate_if(is.double, round, digits = 3)
## # A tibble: 2 x 4
##   key          median     ll     ul
##   <chr>         <dbl>  <dbl>  <dbl>
## 1 difference_1 -0.385 -0.638 -0.129
## 2 difference_2 -0.025 -0.267  0.226