3.2 Example with dichotomous \(X\): The influence of presumed media influence
Here we load a couple necessary packages, load the data, and take a peek.
library(tidyverse)
pmi <- read_csv("data/pmi/pmi.csv")
glimpse(pmi)
## Observations: 123
## Variables: 6
## $ cond <int> 1, 0, 1, 0, 0, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 0, 0, 1, 0, 1, 1, 1, 1, 1, 0, 1, 0, 1, 0...
## $ pmi <dbl> 7.0, 6.0, 5.5, 6.5, 6.0, 5.5, 3.5, 6.0, 4.5, 7.0, 1.0, 6.0, 5.0, 7.0, 7.0, 7.0, 4.5, 3.5...
## $ import <int> 6, 1, 6, 6, 5, 1, 1, 6, 6, 6, 3, 3, 4, 7, 1, 6, 3, 3, 2, 4, 4, 6, 7, 4, 5, 4, 6, 5, 5, 7...
## $ reaction <dbl> 5.25, 1.25, 5.00, 2.75, 2.50, 1.25, 1.50, 4.75, 4.25, 6.25, 1.25, 2.75, 3.75, 5.00, 4.00...
## $ gender <int> 1, 1, 1, 0, 1, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 1, 1, 0, 1, 0, 0, 0, 1...
## $ age <dbl> 51.0, 40.0, 26.0, 21.0, 27.0, 25.0, 23.0, 25.0, 22.0, 24.0, 22.0, 21.0, 23.0, 21.0, 22.0...
You can get the male/female split like so:
pmi %>%
group_by(gender) %>%
count()
## # A tibble: 2 x 2
## # Groups: gender [2]
## gender n
## <int> <int>
## 1 0 80
## 2 1 43
Here is the split by condition
:
pmi %>%
group_by(cond) %>%
count()
## # A tibble: 2 x 2
## # Groups: cond [2]
## cond n
## <int> <int>
## 1 0 65
## 2 1 58
Here is how to get the ungrouped mean and \(SD\) values for reaction
and pmi
, as presented in Table 3.1,
pmi %>%
select(reaction, pmi) %>%
gather() %>%
group_by(key) %>%
summarise(mean = mean(value),
sd = sd(value)) %>%
mutate_if(is.double, round, digits = 3)
## # A tibble: 2 x 3
## key mean sd
## <chr> <dbl> <dbl>
## 1 pmi 5.60 1.32
## 2 reaction 3.48 1.55
You might get the mean and \(SD\) values for reaction
and pmi
grouped by cond
like this:
pmi %>%
select(reaction, pmi, cond) %>%
gather(key, value, -cond) %>%
group_by(cond, key) %>%
summarise(mean = mean(value),
sd = sd(value)) %>%
mutate_if(is.double, round, digits = 3)
## # A tibble: 4 x 4
## # Groups: cond [2]
## cond key mean sd
## <int> <chr> <dbl> <dbl>
## 1 0 pmi 5.38 1.34
## 2 0 reaction 3.25 1.61
## 3 1 pmi 5.85 1.27
## 4 1 reaction 3.75 1.45
Let’s load our primary statistical package.
library(brms)
Before we begin, I should acknowledge that I greatly benefited by this great blog post on path analysis in brms by Jarrett Byrnes. In brms, we handle mediation models using the multivariate syntax. There are a few ways to do this. Let’s start simple.
If you look at the path model in Figure 3.3, you’ll note that reaction
is predicted by pmi
and cond
. pmi
, in turn, is predicted solely by cond
. So we have two regression models, which is just the kind of thing the brms multivariate syntax is for. So first let’s specify both models, which we’ll nest in bf()
functions and save as objects.
y_model <- bf(reaction ~ 1 + pmi + cond)
m_model <- bf(pmi ~ 1 + cond)
Now we have our bf()
objects in hand, we’ll combine them with the +
operator within the brm()
function. We’ll also specify set_rescor(FALSE)
–we’re not interested in adding a residual correlation between reaction
and pmi
.
model1 <-
brm(data = pmi, family = gaussian,
y_model + m_model + set_rescor(FALSE),
chains = 4, cores = 4)
Here are our results.
print(model1)
## Family: MV(gaussian, gaussian)
## Links: mu = identity; sigma = identity
## mu = identity; sigma = identity
## Formula: reaction ~ 1 + pmi + cond
## pmi ~ 1 + cond
## Data: pmi (Number of observations: 123)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## reaction_Intercept 0.53 0.55 -0.56 1.65 4000 1.00
## pmi_Intercept 5.38 0.17 5.06 5.70 4000 1.00
## reaction_pmi 0.50 0.10 0.31 0.70 4000 1.00
## reaction_cond 0.26 0.25 -0.24 0.75 4000 1.00
## pmi_cond 0.47 0.24 -0.01 0.94 4000 1.00
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma_reaction 1.41 0.09 1.24 1.60 4000 1.00
## sigma_pmi 1.32 0.09 1.16 1.50 4000 1.00
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
If you compare our model summary with the coefficients in the path model in Figure 3.3, you’ll see our coefficients are the same. The brms summary also includes intercepts and residual variances, which are typically omitted in path diagrams even though they’re still part of the model.
If you’re getting lost in all the model output, try this.
fixef(model1)[3:5, ] %>% round(digits = 3)
## Estimate Est.Error Q2.5 Q97.5
## reaction_pmi 0.504 0.098 0.309 0.697
## reaction_cond 0.257 0.252 -0.243 0.748
## pmi_cond 0.474 0.240 -0.005 0.936
Also note that Hayes tends to refer to the intercepts as constants.
In his Table 3.2, Hayes included the \(R^2\) values. Here are ours.
bayes_R2(model1) %>% round(digits = 3)
## Estimate Est.Error Q2.5 Q97.5
## R2_reaction 0.208 0.056 0.099 0.315
## R2_pmi 0.039 0.031 0.000 0.112
It’s worth it to actually plot the \(R^2\) distributions.
# we'll get our color palette from ggthemes
library(ggthemes)
bayes_R2(model1, summary = F) %>%
as_tibble() %>%
gather() %>%
ggplot(aes(x = value, fill = key)) +
geom_density(color = "transparent", alpha = 2/3) +
scale_fill_colorblind() + # we got this color palette from the ggthemes package
coord_cartesian(xlim = 0:1) +
labs(title = expression(paste("The ", italic("R")^{2}, " distributions for fit0")),
x = NULL) +
theme_classic()
We went through the trouble of plotting the \(R^2\) distributions because it’s useful to understand that they won’t often be symmetric when they’re near their logical boundaries (i.e., 0 and 1). This is where asymmetric Bayesian credible intervals can really shine.
Let’s get down to business and examine the indirect effect, the \(ab\) pathway. In our model:
- \(a\) =
pmi_cond
- \(b\) =
reaction_pmi
You can isolate them with fixef()[i]
.
fixef(model1)[5 , ]
## Estimate Est.Error Q2.5 Q97.5
## 0.474456456 0.240391424 -0.005044517 0.936437859
fixef(model1)[3 , ]
## Estimate Est.Error Q2.5 Q97.5
## 0.50449451 0.09774672 0.30939585 0.69683689
So the naive approach would be to just multiply them.
(fixef(model1)[5 , ] * fixef(model1)[3 , ]) %>% round(digits = 3)
## Estimate Est.Error Q2.5 Q97.5
## 0.239 0.023 -0.002 0.653
Now, this does get us the correct ‘Estimate’ (i.e., posterior mean). However, the posterior \(SD\) and 95% intervals are off. If you want to do this properly, you need to work with the poster samples themselves. Here they are:
post <- posterior_samples(model1)
glimpse(post)
## Observations: 4,000
## Variables: 8
## $ b_reaction_Intercept <dbl> 1.950176626, -0.642275060, 1.756071145, -0.528590500, 0.965722944, -0.381105...
## $ b_pmi_Intercept <dbl> 5.463846, 5.387659, 5.209249, 5.235493, 5.224044, 5.177939, 5.354357, 5.3499...
## $ b_reaction_pmi <dbl> 0.2303314, 0.7141880, 0.2955409, 0.5939846, 0.4863486, 0.6712376, 0.5585035,...
## $ b_reaction_cond <dbl> 0.568063017, 0.144055063, 0.335846731, 0.938714819, -0.041544801, 0.00300684...
## $ b_pmi_cond <dbl> 0.410513415, 0.341716877, 0.771462188, 0.557731830, 0.479940454, 0.862242164...
## $ sigma_reaction <dbl> 1.518972, 1.323994, 1.485331, 1.345070, 1.484862, 1.509208, 1.210915, 1.6082...
## $ sigma_pmi <dbl> 1.414180, 1.229791, 1.421680, 1.252101, 1.506832, 1.322447, 1.276929, 1.3284...
## $ lp__ <dbl> -436.9322, -434.8615, -435.8168, -438.9417, -436.6032, -435.2722, -435.1702,...
Here we compute the indirect effect, ab
.
post <-
post %>%
mutate(ab = b_pmi_cond*b_reaction_pmi)
Now we have ab
as a properly computed vector, we can summarize it with the quantile()
function.
quantile(post$ab, probs = c(.5, .025, .975)) %>%
round(digits = 3)
## 50% 2.5% 97.5%
## 0.232 -0.003 0.522
And we can even visualize it as a density.
post %>%
ggplot(aes(x = ab)) +
geom_density(color = "transparent",
fill = colorblind_pal()(3)[3]) +
scale_y_continuous(NULL, breaks = NULL) +
labs(title = expression(paste("Our indirect effect, the ", italic("ab"), " pathway")),
x = NULL) +
theme_classic()
It’s also worth pointing out that as the indirect effect isn’t perfectly symmetric, its mean and median aren’t quite the same.
post %>%
summarize(mean = mean(ab),
median = median(ab)) %>%
round(digits = 3)
## mean median
## 1 0.24 0.232
Their magnitudes are similar, but this asymmetry will be a source of contrast to our estimates and those in the text. This is also something to consider when reporting on central tendency. When the indirect effect–or any other parameter, for that matter–is quite asymmetric, you might prefer reporting the median rather than the mean.
On page 90, Hayes computed the adjusted means for \(Y\). For both cond == 1
and cond == 0
, he computed the expected values for reaction
when pmi
was at its mean. A natural way to do that in brms is with fitted()
. First we’ll put our input values for cond
and pmi
in a tibble, which we’ll call nd
. Then we’ll feed nd
into the newdata
argument within the fitted()
function.
nd <-
tibble(cond = 1:0,
pmi = mean(pmi$pmi))
fitted(model1, newdata = nd)
## , , reaction
##
## Estimate Est.Error Q2.5 Q97.5
## [1,] 3.615537 0.1873641 3.247219 3.985118
## [2,] 3.358847 0.1752851 3.009573 3.697929
##
## , , pmi
##
## Estimate Est.Error Q2.5 Q97.5
## [1,] 5.852410 0.1736779 5.506716 6.195187
## [2,] 5.377954 0.1681786 5.056328 5.695418
Because model1
is a multivariate model, fitted()
returns the model-implied summaries for both reaction
and pmi
. If you just want the adjusted means for reaction
, you can use the resp
argument within fitted()
.
fitted(model1, newdata = nd, resp = "reaction") %>% round(digits = 3)
## Estimate Est.Error Q2.5 Q97.5
## [1,] 3.616 0.187 3.247 3.985
## [2,] 3.359 0.175 3.010 3.698
Note how this is where the two values in the \(Y\) adjusted column in Table 3.1 came from.
However, if we want to reproduce how Hayes computed the total effect (i.e., \(c'\) + \(ab\)), we’ll need to work with the posterior itself, post
. Recall, we’ve already saved the indirect effect as a vector, ab
. The direct effect, \(c'\), is labeled b_reaction_cond
within post
. in order to get the total effect, \(c\), all we need to is add those vectors together.
post <-
post %>%
mutate(total_effect = b_reaction_cond + ab)
Here’s the posterior mean with its 95% intervals
post %>%
summarize(mean = mean(total_effect),
ll = quantile(total_effect, prob = .025),
ul = quantile(total_effect, prob = .975))
## mean ll ul
## 1 0.4967976 -0.03600994 1.024548
3.2.1 Estimation of the model in PROCESS for SPSS and SAS.
Nothing new for us, here.