6 Mediation Analysis with a Multicategorical Antecedent

“Historically, investigators interested in doing a mediation analysis with a multicategorical antecedents \(X\) have resorted to some less optimal strategies than the one [Hayes] discuss[ed] in this chapter (p. 188).” Happily, the approach outlined in this chapter avoids such gaffs. Hayes’s procedure “does not require discarding any data; the entire sample is analyzed simultaneously. Furthermore, the multicategorical nature of \(X\) is respected and retained (p. 189).”

6.1 Relative total, direct, and indirect effects

In review of regression analysis in Chapter 2, we saw that a multicategorical antecedent variable with \(g\) categories can be used as an antecedent variable in a regression model if it is represented by \(g - 1\) variables using some kind of group coding system (see section 2.7). [Hayes] described indicator or dummy coding as one such system, where groups are represented with \(g - 1\) variables set to either zero or one (see Table 2.1). With indicator coding, one of the \(g\) groups is chosen as the reference group. Cases in the reference group receive a zero on all \(g - 1\) variables coding \(X\). Each of the remaining \(g - 1\) groups gets its own indicator variable that is set to 1 for cases in that group, with all other cases set to zero. Using such a system, which of the \(g\) groups a case is in is represented by its pattern of zeros and ones on the \(g - 1\) indicator variables. These \(g - 1\) indicator variables are then used as antecedent variables in a regression model as a stand-in for \(X\). (pp. 189–190, emphasis in the original)

6.1.1 Relative indirect effects.

When our \(X\) is multicategorical, we end up with \(g - 1\) \(a\) coefficients. Presuming the \(M\) variable is continuous or binary, this will yield \(g - 1\) relative indirect effects, \(a_j b\).

6.1.2 Relative direct effects.

Similar to above, when our \(X\) is multicategorical, we end up with \(g - 1\) \(c'\) coefficients, each of which is a relative direct effects.

6.1.3 Relative total effects.

With the two prior subsections in mind, when our \(X\) is multicategorical, we end up with \(g - 1\) \(c\) coefficients, each of which is a relative total effect. These follow the form

\[c_j = c_j' + a_j b,\]

where \(j\) indexes a given group.

6.2 An example: Sex discrimination in the workplace

Here we load a couple necessary packages, load the data, and take a glimpse().

## Observations: 129
## Variables: 6
## $ subnum   <dbl> 209, 44, 124, 232, 30, 140, 27, 64, 67, 182, 85, 109, 122, 69, 45, 28, 170, 66, 168, 97, 7…
## $ protest  <dbl> 2, 0, 2, 2, 2, 1, 2, 0, 0, 0, 2, 2, 0, 1, 1, 0, 1, 2, 2, 1, 2, 1, 1, 2, 2, 0, 1, 1, 0, 1, …
## $ sexism   <dbl> 4.87, 4.25, 5.00, 5.50, 5.62, 5.75, 5.12, 6.62, 5.75, 4.62, 4.75, 6.12, 4.87, 5.87, 4.87, …
## $ angry    <dbl> 2, 1, 3, 1, 1, 1, 2, 1, 6, 1, 2, 5, 2, 1, 1, 1, 2, 1, 3, 4, 1, 1, 1, 5, 1, 5, 1, 1, 2, 1, …
## $ liking   <dbl> 4.83, 4.50, 5.50, 5.66, 6.16, 6.00, 4.66, 6.50, 1.00, 6.83, 5.00, 5.66, 5.83, 6.50, 4.50, …
## $ respappr <dbl> 4.25, 5.75, 4.75, 7.00, 6.75, 5.50, 5.00, 6.25, 3.00, 5.75, 5.25, 7.00, 4.50, 6.25, 5.00, …

Here are the ungrouped means and \(SD\)s for respappr and liking shown at the bottom of Table 6.1.

## # A tibble: 2 x 3
##   name      mean    sd
##   <chr>    <dbl> <dbl>
## 1 liking    5.64  1.05
## 2 respappr  4.87  1.35

We compute the summaries for respappr and liking, grouped by protest, like so.

## # A tibble: 6 x 4
## # Groups:   protest [3]
##   protest name      mean    sd
##     <dbl> <chr>    <dbl> <dbl>
## 1       0 liking    5.31 1.30 
## 2       0 respappr  3.88 1.46 
## 3       1 liking    5.83 0.819
## 4       1 respappr  5.14 1.08 
## 5       2 liking    5.75 0.936
## 6       2 respappr  5.49 0.936

It looks like Hayes has a typo in the \(SD\) for liking when protest == 0. It seems he accidentally entered the value for when protest == 1 in that slot.

You’ll have to wait a minute to see where the adjusted \(Y\) values came from.

With a little if_else(), computing the dummies d1 and d2 is easy enough.

We’re almost ready to fit the model. Let’s load brms.

This is the first time we’ve had a simple univariate regression model in a while–no special mvbind() syntax or multiple bf() formulas, just straight up brms::brm().

Check the coefficient summaries.

##            Estimate Est.Error         Q2.5     Q97.5
## Intercept 5.3065586 0.1644651  4.987521552 5.6321918
## d1        0.5187973 0.2313537  0.066754379 0.9659453
## d2        0.4483320 0.2250177 -0.004960553 0.8860026

Our \(R^2\) differences a bit from the OLS version in the text. This shouldn’t be surprising when it’s near the boundary.

##      Estimate  Est.Error        Q2.5     Q97.5
## R2 0.05882954 0.03580924 0.005361728 0.1404927

Here’s its shape. For the plots in this chapter, we’ll take a few formatting cues from Edward Tufte, curtesy of the ggthemes package. The theme_tufte() function will change the default font and remove some chart junk. We will take our color palette from Pokemon via the palettetown package.

To use the model-implied equations to compute the means for each group on the criterion, we’ll extract the posterior samples.

## # A tibble: 3 x 3
##   name   mean    sd
##   <fct> <dbl> <dbl>
## 1 Y_np   5.31 0.164
## 2 Y_ip   5.83 0.157
## 3 Y_cp   5.75 0.152

What Hayes called the “relative total effects” \(c_1\) and \(c_2\) are the d1 and d2 lines in our fixef() output, above.

Here are the sub-models for the mediation model.

There’s a third way to fit multivariate models in brms. It uses either the mvbrmsformula() function, or its abbreviated version, mvbf(). With these, we first define our submodels in br() statements like before. We then combine them within mvbf(), separated with a comma. If we’d like to avoid estimating a residual correlation, which we do in this project–, we then set rescore = FALSE. Here’s how it looks like for our second model.

##  Family: MV(gaussian, gaussian) 
##   Links: mu = identity; sigma = identity
##          mu = identity; sigma = identity 
## Formula: respappr ~ 1 + d1 + d2 
##          liking ~ 1 + d1 + d2 + respappr 
##    Data: protest (Number of observations: 129) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##                    Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## respappr_Intercept     3.89      0.19     3.53     4.25 1.00     5389     2814
## liking_Intercept       3.72      0.31     3.09     4.33 1.00     7012     3330
## respappr_d1            1.26      0.25     0.76     1.75 1.00     4860     3084
## respappr_d2            1.60      0.25     1.10     2.11 1.00     4966     3327
## liking_d1              0.00      0.23    -0.44     0.45 1.00     3736     3297
## liking_d2             -0.22      0.23    -0.66     0.23 1.00     3326     2936
## liking_respappr        0.41      0.07     0.27     0.55 1.00     4364     3188
## 
## Family Specific Parameters: 
##                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma_respappr     1.18      0.08     1.04     1.34 1.00     6479     2945
## sigma_liking       0.93      0.06     0.82     1.06 1.01     5449     2523
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

Behold the Bayesian \(R^2\) posteriors.

To get the model summaries as presented in the second two columns in Table 6.2, we use posterior_samples(), rename a bit, and summarize. Like in the last chapter, here we’ll do so with a little help from tidybayes.

## # A tibble: 7 x 7
##   name      value .lower .upper .width .point .interval
##   <chr>     <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 a1        1.26   0.762  1.76    0.95 mean   qi       
## 2 a2        1.60   1.10   2.11    0.95 mean   qi       
## 3 b         0.41   0.27   0.547   0.95 mean   qi       
## 4 c1_prime  0.003 -0.44   0.449   0.95 mean   qi       
## 5 c2_prime -0.215 -0.657  0.23    0.95 mean   qi       
## 6 i_m       3.88   3.53   4.25    0.95 mean   qi       
## 7 i_y       3.72   3.09   4.33    0.95 mean   qi

Working with the \(\overline M_{ij}\) formulas in page 199 is quite similar to what we did above.

## # A tibble: 3 x 3
##   name   mean    sd
##   <fct> <dbl> <dbl>
## 1 M_np   3.89 0.188
## 2 M_ip   5.14 0.177
## 3 M_cp   5.49 0.175

The \(\overline Y^*_{ij}\) formulas are more of the same.

## # A tibble: 3 x 3
##   name   mean    sd
##   <fct> <dbl> <dbl>
## 1 Y_np   5.71 0.163
## 2 Y_ip   5.72 0.143
## 3 Y_cp   5.50 0.142

Note, these are where the adjusted \(Y\) values came from in Table 6.1.

This is as fine a spot as any to introduce coefficient plots. brms, tidybayes, and the bayesplot package all offer convenience functions for coefficient plots. Before we get all lazy using convenience functions, it’s good to know how to make coefficient plots by hand. Here’s ours for those last three \(\overline Y^*_{ij}\)-values.

The points are the posterior medians, the thick inner lines the 50% intervals, and the thinner outer lines the 95% intervals. For kicks, we distinguished the three values by color.

If we want to examine \(R^2\) change for dropping the dummy variables, we’ll first fit a model that omits them.

Here are the competing \(R^2\) distributions.

If you want to compare then with a change score, do something like this.

Now compute the posterior means and 95% intervals for \(a_1 b\) and \(a_2 b\), the conditional indirect effects.

## # A tibble: 2 x 7
##   name  value .lower .upper .width .point .interval
##   <chr> <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 a1b   0.516  0.268  0.812   0.95 mean   qi       
## 2 a2b   0.658  0.38   0.984   0.95 mean   qi

6.3 Using a different group coding system

Here we’ll make our alternative dummies, what we’ll call d_1 and d_2, with orthogonal contrast coding.

Here are the sub-models.

Now we fit using the mvbf() approach.

Here are our intercepts and regression coefficient summaries.

##                      Estimate  Est.Error       Q2.5     Q97.5
## respappr_Intercept  4.8421202 0.10233999  4.6374197 5.0488520
## liking_Intercept    3.6295030 0.35531407  2.9201509 4.3177206
## respappr_d_1        1.4362663 0.21736835  1.0058359 1.8627144
## respappr_d_2        0.3516122 0.25351866 -0.1578317 0.8447912
## liking_d_1         -0.1150324 0.20358287 -0.5087575 0.2845606
## liking_d_2         -0.2177093 0.20456106 -0.6275818 0.1862900
## liking_respappr     0.4132765 0.07107282  0.2747422 0.5561193

It’s important to note that these will not correspond to the “TOTAL EFFECT MODEL” section of the PROCESS output of Figure 6.3. Hayes’s PROCESS has the mcx=3 command which tells the program to reparametrize the orthogonal contrasts. brms doesn’t have such a command.

For now, we’ll have to jump to Equation 6.8 towards the bottom of page 207. Those parameters are evident in our output. For good measure, here we’ll practice with posterior_summary().

##              parameter  Estimate Est.Error       Q2.5     Q97.5
## 1 b_respappr_Intercept 4.8421202 0.1023400  4.6374197 5.0488520
## 2       b_respappr_d_1 1.4362663 0.2173683  1.0058359 1.8627144
## 3       b_respappr_d_2 0.3516122 0.2535187 -0.1578317 0.8447912

Thus it’s easy to get the \(\overline M_{ij}\) means with a little posterior manipulation.

## # A tibble: 3 x 3
##   name   mean    sd
##   <fct> <dbl> <dbl>
## 1 M_np   3.88 0.181
## 2 M_ip   5.15 0.176
## 3 M_cp   5.50 0.177

With these in hand, we can compute \(a_1\) and \(a_2\).

## # A tibble: 2 x 3
##   name   mean    sd
##   <chr> <dbl> <dbl>
## 1 a1    1.44  0.217
## 2 a2    0.352 0.254

Happily, our model output will allow us to work with Hayes’s \(\overline Y^*_{ij}\) equations in the middle of page 210.

## # A tibble: 3 x 3
##   name   mean    sd
##   <fct> <dbl> <dbl>
## 1 Y_np   5.72 0.161
## 2 Y_ip   5.71 0.145
## 3 Y_cp   5.49 0.147

And with these in hand, we can compute \(c'_1\) and \(c'_2\).

## # A tibble: 2 x 3
##   name       mean    sd
##   <chr>     <dbl> <dbl>
## 1 c1_prime -0.115 0.204
## 2 c2_prime -0.218 0.205

It appears Hayes has a typo in the formula for \(c'_2\) on page 211. The value he has down for \(\overline Y^*_{IP}\), 5.145, is incorrect. It’s not the one he displayed at the bottom of the previous page and it also contradicts the analyses herein. So it goes… These things happen.

We haven’t spelled it out, but the \(b\) parameter is currently labeled b_liking_respappr in our post object. Here we’ll make a b column to make things easier. While we’re at it, we’ll compute the indirect effects, too.

## # A tibble: 2 x 7
##   name  value .lower .upper .width .point .interval
##   <chr> <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 a1b   0.593  0.35   0.882   0.95 mean   qi       
## 2 a2b   0.145 -0.065  0.368   0.95 mean   qi

Now we can compute and summarize() our \(c_1\) and \(c_2\).

## # A tibble: 2 x 3
##   name     mean    sd
##   <chr>   <dbl> <dbl>
## 1 c1     0.478  0.197
## 2 c2    -0.0724 0.235

6.4 Some miscellaneous issues [unrelated to those Hayes covered in the text]

Do you recall how way back in Chapter 2 we covered an alternative way to fit models with multicategorical grouping variables? Well, we did. The basic strategy is to save our grouping variable as a factor and then enter it into the model with the special 0 + syntax, which removes the typical intercept. Since this chapter is all about multicategorical variables, it might make sense to explore what happens when we use this approach. For our first step, well prepare the data.

## # A tibble: 129 x 2
##    protest group     
##      <dbl> <fct>     
##  1       2 collective
##  2       0 none      
##  3       2 collective
##  4       2 collective
##  5       2 collective
##  6       1 individual
##  7       2 collective
##  8       0 none      
##  9       0 none      
## 10       0 none      
## # … with 119 more rows

Before we fit a full mediation model, we should warm up. Here we fit a univariable model for liking. This is an alternative to what we did way back with model6.1.

Check the summary.

##  Family: gaussian 
##   Links: mu = identity; sigma = identity 
## Formula: liking ~ 0 + group 
##    Data: protest (Number of observations: 129) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##                 Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## groupnone           5.31      0.17     4.99     5.63 1.00     4481     2742
## groupindividual     5.82      0.16     5.51     6.14 1.00     4041     2890
## groupcollective     5.75      0.15     5.46     6.05 1.00     4286     3122
## 
## Family Specific Parameters: 
##       Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma     1.04      0.07     0.92     1.18 1.00     3820     3058
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

There’s no conventional intercept parameter. Rather, each of the each of the levels of group get its own conditional intercept. To get a sense of what this model is, let’s practice our coefficient plotting skills. This time we’ll compute the necessary values before plugging them into ggplot2.

The results from the model are in colored point ranges. The black dots in the foreground are the empirical means. It looks like our model did a good job estimating the group means for liking.

Let’s see how this coding approach works when you fit a full mediation model. First, define the sub-models with two bf() lines.

Now fit model6 using the mvbf() approach.

What will the summary hold?

##  Family: MV(gaussian, gaussian) 
##   Links: mu = identity; sigma = identity
##          mu = identity; sigma = identity 
## Formula: respappr ~ 0 + group 
##          liking ~ 0 + group + respappr 
##    Data: protest (Number of observations: 129) 
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
## 
## Population-Level Effects: 
##                          Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## respappr_groupnone           3.88      0.18     3.53     4.25 1.00     4047     2935
## respappr_groupindividual     5.14      0.18     4.80     5.50 1.00     4292     2872
## respappr_groupcollective     5.49      0.18     5.14     5.84 1.00     4163     3112
## liking_groupnone             3.69      0.31     3.07     4.31 1.00     1809     2305
## liking_groupindividual       3.69      0.39     2.91     4.44 1.00     1791     2306
## liking_groupcollective       3.47      0.41     2.65     4.29 1.00     1722     1948
## liking_respappr              0.42      0.07     0.28     0.56 1.00     1667     2197
## 
## Family Specific Parameters: 
##                Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sigma_respappr     1.18      0.07     1.05     1.34 1.00     4186     2680
## sigma_liking       0.93      0.06     0.82     1.05 1.00     3887     2790
## 
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample 
## is a crude measure of effective sample size, and Rhat is the potential 
## scale reduction factor on split chains (at convergence, Rhat = 1).

If you flip back to page 199, you’ll notice the posterior mean in the first three rows (i.e., respappr_groupnone through respappr_groupcollective) correspond to the estimates for \(\overline M_{NP}\), \(\overline M_{IP}\), and \(\overline M_{CP}\), respectively.

Let’s get to the \(a_j b\) estimates.

## # A tibble: 3 x 7
##   name  value .lower .upper .width .point .interval
##   <chr> <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 a1b    1.61   1.06   2.19   0.95 mean   qi       
## 2 a2b    2.14   1.42   2.87   0.95 mean   qi       
## 3 a3b    2.28   1.52   3.07   0.95 mean   qi

With this parameterization, it’s a little difficult to the \(a_j b\) estimates. None of them are in comparison to anything. However, this approach is quite useful once you compute their various difference scores.

## # A tibble: 3 x 7
##   name                             value .lower .upper .width .point .interval
##   <chr>                            <dbl>  <dbl>  <dbl>  <dbl> <chr>  <chr>    
## 1 diff_collective_minus_individual 0.144 -0.055  0.364   0.95 mean   qi       
## 2 diff_collective_minus_none       0.669  0.381  0.992   0.95 mean   qi       
## 3 diff_individual_minus_none       0.524  0.274  0.826   0.95 mean   qi

If you look back to our results from model6.2, you’ll see that diff_individual_minus_none and diff_collective_minus_none correspond to a1b and a2b, respectively. But with our model6.6 approach, we get the additional information of what kind of indirect effect we might have yielded had we used a different coding scheme for our original set of dummy variables. That is, we get diff_collective_minus_individual.

Session info

## R version 3.6.0 (2019-04-26)
## Platform: x86_64-apple-darwin15.6.0 (64-bit)
## Running under: macOS High Sierra 10.13.6
## 
## Matrix products: default
## BLAS:   /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/3.6/Resources/lib/libRlapack.dylib
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## 
## other attached packages:
##  [1] tidybayes_1.1.0   palettetown_0.1.1 ggthemes_4.2.0    brms_2.10.3       Rcpp_1.0.2       
##  [6] forcats_0.4.0     stringr_1.4.0     dplyr_0.8.3       purrr_0.3.3       readr_1.3.1      
## [11] tidyr_1.0.0       tibble_2.1.3      ggplot2_3.2.1     tidyverse_1.2.1  
## 
## loaded via a namespace (and not attached):
##  [1] colorspace_1.4-1          ggridges_0.5.1            rsconnect_0.8.15          ggstance_0.3.2           
##  [5] markdown_1.1              base64enc_0.1-3           rstudioapi_0.10           rstan_2.19.2             
##  [9] svUnit_0.7-12             DT_0.9                    fansi_0.4.0               lubridate_1.7.4          
## [13] xml2_1.2.0                bridgesampling_0.7-2      knitr_1.23                shinythemes_1.1.2        
## [17] zeallot_0.1.0             bayesplot_1.7.0           jsonlite_1.6              broom_0.5.2              
## [21] shiny_1.3.2               compiler_3.6.0            httr_1.4.0                backports_1.1.5          
## [25] assertthat_0.2.1          Matrix_1.2-17             lazyeval_0.2.2            cli_1.1.0                
## [29] later_1.0.0               htmltools_0.4.0           prettyunits_1.0.2         tools_3.6.0              
## [33] igraph_1.2.4.1            coda_0.19-3               gtable_0.3.0              glue_1.3.1.9000          
## [37] reshape2_1.4.3            cellranger_1.1.0          vctrs_0.2.0               nlme_3.1-139             
## [41] crosstalk_1.0.0           xfun_0.10                 ps_1.3.0                  rvest_0.3.4              
## [45] mime_0.7                  miniUI_0.1.1.1            lifecycle_0.1.0           gtools_3.8.1             
## [49] zoo_1.8-6                 scales_1.0.0              colourpicker_1.0          hms_0.4.2                
## [53] promises_1.1.0            Brobdingnag_1.2-6         parallel_3.6.0            inline_0.3.15            
## [57] shinystan_2.5.0           gridExtra_2.3             loo_2.1.0                 StanHeaders_2.19.0       
## [61] stringi_1.4.3             dygraphs_1.1.1.6          pkgbuild_1.0.5            rlang_0.4.1              
## [65] pkgconfig_2.0.3           matrixStats_0.55.0        evaluate_0.14             lattice_0.20-38          
## [69] rstantools_2.0.0          htmlwidgets_1.5           labeling_0.3              tidyselect_0.2.5         
## [73] processx_3.4.1            plyr_1.8.4                magrittr_1.5              R6_2.4.0                 
## [77] generics_0.0.2            pillar_1.4.2              haven_2.1.0               withr_2.1.2              
## [81] xts_0.11-2                abind_1.4-5               modelr_0.1.4              crayon_1.3.4             
## [85] arrayhelpers_1.0-20160527 utf8_1.1.4                rmarkdown_1.13            grid_3.6.0               
## [89] readxl_1.3.1              callr_3.3.2               threejs_0.3.1             digest_0.6.21            
## [93] xtable_1.8-4              httpuv_1.5.2              stats4_3.6.0              munsell_0.5.0            
## [97] shinyjs_1.0