## 3.2 Example with dichotomous \(X\): The influence of presumed media influence

Here we load a couple necessary packages, load the data, and take a peek.

```
library(tidyverse)
pmi <- read_csv("data/pmi/pmi.csv")
glimpse(pmi)
```

```
## Observations: 123
## Variables: 6
## $ cond <int> 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0...
## $ pmi <dbl> 7.0, 6.0, 5.5, 6.5, 6.0, 5.5, 3.5, 6.0, 4.5, 7.0, 1.0, 6.0, 5.0, 7.0, 7.0, 7.0, 4.5, 3.5...
## $ import <int> 6, 1, 6, 6, 5, 1, 1, 6, 6, 6, 3, 3, 4, 7, 1, 6, 3, 3, 2, 4, 4, 6, 7, 4, 5, 4, 6, 5, 5, 7...
## $ reaction <dbl> 5.25, 1.25, 5.00, 2.75, 2.50, 1.25, 1.50, 4.75, 4.25, 6.25, 1.25, 2.75, 3.75, 5.00, 4.00...
## $ gender <int> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1...
## $ age <dbl> 51.0, 40.0, 26.0, 21.0, 27.0, 25.0, 23.0, 25.0, 22.0, 24.0, 22.0, 21.0, 23.0, 21.0, 22.0...
```

You can get the male/female split like so:

```
pmi %>%
group_by(gender) %>%
count()
```

```
## # A tibble: 2 x 2
## # Groups: gender [2]
## gender n
## <int> <int>
## 1 0 80
## 2 1 43
```

Here is the split by `condition`

:

```
pmi %>%
group_by(cond) %>%
count()
```

```
## # A tibble: 2 x 2
## # Groups: cond [2]
## cond n
## <int> <int>
## 1 0 65
## 2 1 58
```

Here is how to get the ungrouped mean and \(SD\) values for `reaction`

and `pmi`

, as presented in Table 3.1,

```
pmi %>%
select(reaction, pmi) %>%
gather() %>%
group_by(key) %>%
summarise(mean = mean(value),
sd = sd(value)) %>%
mutate_if(is.double, round, digits = 3)
```

```
## # A tibble: 2 x 3
## key mean sd
## <chr> <dbl> <dbl>
## 1 pmi 5.60 1.32
## 2 reaction 3.48 1.55
```

You might get the mean and \(SD\) values for `reaction`

and `pmi`

grouped by `cond`

like this:

```
pmi %>%
select(reaction, pmi, cond) %>%
gather(key, value, -cond) %>%
group_by(cond, key) %>%
summarise(mean = mean(value),
sd = sd(value)) %>%
mutate_if(is.double, round, digits = 3)
```

```
## # A tibble: 4 x 4
## # Groups: cond [2]
## cond key mean sd
## <int> <chr> <dbl> <dbl>
## 1 0 pmi 5.38 1.34
## 2 0 reaction 3.25 1.61
## 3 1 pmi 5.85 1.27
## 4 1 reaction 3.75 1.45
```

Let’s load our primary statistical package.

`library(brms)`

Before we begin, I should acknowledge that I greatly benefited by this great blog post on path analysis in brms by Jarrett Byrnes. In brms, we handle mediation models using the multivariate syntax. There are a few ways to do this. Let’s start simple.

If you look at the path model in Figure 3.3, you’ll note that `reaction`

is predicted by `pmi`

and `cond`

. `pmi`

, in turn, is predicted solely by `cond`

. So we have two regression models, which is just the kind of thing the brms multivariate syntax is for. So first let’s specify both models, which we’ll nest in `bf()`

functions and save as objects.

```
y_model <- bf(reaction ~ 1 + pmi + cond)
m_model <- bf(pmi ~ 1 + cond)
```

Now we have our `bf()`

objects in hand, we’ll combine them with the `+`

operator within the `brm()`

function. We’ll also specify `set_rescor(FALSE)`

–we’re not interested in adding a residual correlation between `reaction`

and `pmi`

.

```
model1 <-
brm(data = pmi, family = gaussian,
y_model + m_model + set_rescor(FALSE),
chains = 4, cores = 4)
```

Here are our results.

`print(model1)`

```
## Family: MV(gaussian, gaussian)
## Links: mu = identity; sigma = identity
## mu = identity; sigma = identity
## Formula: reaction ~ 1 + pmi + cond
## pmi ~ 1 + cond
## Data: pmi (Number of observations: 123)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## reaction_Intercept 0.53 0.55 -0.56 1.65 4000 1.00
## pmi_Intercept 5.38 0.17 5.06 5.70 4000 1.00
## reaction_pmi 0.50 0.10 0.31 0.70 4000 1.00
## reaction_cond 0.26 0.25 -0.24 0.75 4000 1.00
## pmi_cond 0.47 0.24 -0.01 0.94 4000 1.00
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma_reaction 1.41 0.09 1.24 1.60 4000 1.00
## sigma_pmi 1.32 0.09 1.16 1.50 4000 1.00
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
```

If you compare our model summary with the coefficients in the path model in Figure 3.3, you’ll see our coefficients are the same. The brms summary also includes intercepts and residual variances, which are typically omitted in path diagrams even though they’re still part of the model.

If you’re getting lost in all the model output, try this.

`fixef(model1)[3:5, ] %>% round(digits = 3)`

```
## Estimate Est.Error Q2.5 Q97.5
## reaction_pmi 0.504 0.098 0.309 0.697
## reaction_cond 0.257 0.252 -0.243 0.748
## pmi_cond 0.474 0.240 -0.005 0.936
```

Also note that Hayes tends to refer to the intercepts as constants.

In his Table 3.2, Hayes included the \(R^2\) values. Here are ours.

`bayes_R2(model1) %>% round(digits = 3)`

```
## Estimate Est.Error Q2.5 Q97.5
## R2_reaction 0.208 0.056 0.099 0.315
## R2_pmi 0.039 0.031 0.000 0.112
```

It’s worth it to actually plot the \(R^2\) distributions.

```
# we'll get our color palette from ggthemes
library(ggthemes)
bayes_R2(model1, summary = F) %>%
as_tibble() %>%
gather() %>%
ggplot(aes(x = value, fill = key)) +
geom_density(color = "transparent", alpha = 2/3) +
scale_fill_colorblind() + # we got this color palette from the ggthemes package
coord_cartesian(xlim = 0:1) +
labs(title = expression(paste("The ", italic("R")^{2}, " distributions for fit0")),
x = NULL) +
theme_classic()
```

We went through the trouble of plotting the \(R^2\) distributions because it’s useful to understand that they won’t often be symmetric when they’re near their logical boundaries (i.e., 0 and 1). This is where asymmetric Bayesian credible intervals can really shine.

Let’s get down to business and examine the indirect effect, the \(ab\) pathway. In our model:

- \(a\) =
`pmi_cond`

- \(b\) =
`reaction_pmi`

You can isolate them with `fixef()[i]`

.

`fixef(model1)[5 , ]`

```
## Estimate Est.Error Q2.5 Q97.5
## 0.474456456 0.240391424 -0.005044517 0.936437859
```

`fixef(model1)[3 , ]`

```
## Estimate Est.Error Q2.5 Q97.5
## 0.50449451 0.09774672 0.30939585 0.69683689
```

So the naive approach would be to just multiply them.

`(fixef(model1)[5 , ] * fixef(model1)[3 , ]) %>% round(digits = 3)`

```
## Estimate Est.Error Q2.5 Q97.5
## 0.239 0.023 -0.002 0.653
```

Now, this does get us the correct ‘Estimate’ (i.e., posterior mean). However, the posterior \(SD\) and 95% intervals are off. If you want to do this properly, you need to work with the poster samples themselves. Here they are:

```
post <- posterior_samples(model1)
glimpse(post)
```

```
## Observations: 4,000
## Variables: 8
## $ b_reaction_Intercept <dbl> 1.950176626, -0.642275060, 1.756071145, -0.528590500, 0.965722944, -0.381105...
## $ b_pmi_Intercept <dbl> 5.463846, 5.387659, 5.209249, 5.235493, 5.224044, 5.177939, 5.354357, 5.3499...
## $ b_reaction_pmi <dbl> 0.2303314, 0.7141880, 0.2955409, 0.5939846, 0.4863486, 0.6712376, 0.5585035,...
## $ b_reaction_cond <dbl> 0.568063017, 0.144055063, 0.335846731, 0.938714819, -0.041544801, 0.00300684...
## $ b_pmi_cond <dbl> 0.410513415, 0.341716877, 0.771462188, 0.557731830, 0.479940454, 0.862242164...
## $ sigma_reaction <dbl> 1.518972, 1.323994, 1.485331, 1.345070, 1.484862, 1.509208, 1.210915, 1.6082...
## $ sigma_pmi <dbl> 1.414180, 1.229791, 1.421680, 1.252101, 1.506832, 1.322447, 1.276929, 1.3284...
## $ lp__ <dbl> -436.9322, -434.8615, -435.8168, -438.9417, -436.6032, -435.2722, -435.1702,...
```

Here we compute the indirect effect, `ab`

.

```
post <-
post %>%
mutate(ab = b_pmi_cond*b_reaction_pmi)
```

Now we have `ab`

as a properly computed vector, we can summarize it with the `quantile()`

function.

```
quantile(post$ab, probs = c(.5, .025, .975)) %>%
round(digits = 3)
```

```
## 50% 2.5% 97.5%
## 0.232 -0.003 0.522
```

And we can even visualize it as a density.

```
post %>%
ggplot(aes(x = ab)) +
geom_density(color = "transparent",
fill = colorblind_pal()(3)[3]) +
scale_y_continuous(NULL, breaks = NULL) +
labs(title = expression(paste("Our indirect effect, the ", italic("ab"), " pathway")),
x = NULL) +
theme_classic()
```

It’s also worth pointing out that as the indirect effect isn’t perfectly symmetric, its mean and median aren’t quite the same.

```
post %>%
summarize(mean = mean(ab),
median = median(ab)) %>%
round(digits = 3)
```

```
## mean median
## 1 0.24 0.232
```

Their magnitudes are similar, but this asymmetry will be a source of contrast to our estimates and those in the text. This is also something to consider when reporting on central tendency. When the indirect effect–or any other parameter, for that matter–is quite asymmetric, you might prefer reporting the median rather than the mean.

On page 90, Hayes computed the *adjusted means* for \(Y\). For both `cond == 1`

and `cond == 0`

, he computed the expected values for `reaction`

when `pmi`

was at its mean. A natural way to do that in brms is with `fitted()`

. First we’ll put our input values for `cond`

and `pmi`

in a tibble, which we’ll call `nd`

. Then we’ll feed `nd`

into the `newdata`

argument within the `fitted()`

function.

```
nd <-
tibble(cond = 1:0,
pmi = mean(pmi$pmi))
fitted(model1, newdata = nd)
```

```
## , , reaction
##
## Estimate Est.Error Q2.5 Q97.5
## [1,] 3.615537 0.1873641 3.247219 3.985118
## [2,] 3.358847 0.1752851 3.009573 3.697929
##
## , , pmi
##
## Estimate Est.Error Q2.5 Q97.5
## [1,] 5.852410 0.1736779 5.506716 6.195187
## [2,] 5.377954 0.1681786 5.056328 5.695418
```

Because `model1`

is a multivariate model, `fitted()`

returns the model-implied summaries for both `reaction`

and `pmi`

. If you just want the adjusted means for `reaction`

, you can use the `resp`

argument within `fitted()`

.

`fitted(model1, newdata = nd, resp = "reaction") %>% round(digits = 3)`

```
## Estimate Est.Error Q2.5 Q97.5
## [1,] 3.616 0.187 3.247 3.985
## [2,] 3.359 0.175 3.010 3.698
```

Note how this is where the two values in the \(Y\) adjusted column in Table 3.1 came from.

However, if we want to reproduce how Hayes computed the total effect (i.e., \(c'\) + \(ab\)), we’ll need to work with the posterior itself, `post`

. Recall, we’ve already saved the indirect effect as a vector, `ab`

. The direct effect, \(c'\), is labeled `b_reaction_cond`

within `post`

. in order to get the total effect, \(c\), all we need to is add those vectors together.

```
post <-
post %>%
mutate(total_effect = b_reaction_cond + ab)
```

Here’s the posterior mean with its 95% intervals

```
post %>%
summarize(mean = mean(total_effect),
ll = quantile(total_effect, prob = .025),
ul = quantile(total_effect, prob = .975))
```

```
## mean ll ul
## 1 0.4967976 -0.03600994 1.024548
```

### 3.2.1 ~~Estimation of the model in PROCESS for SPSS and SAS.~~

Nothing new for us, here.