4.5 Multiple \(X\)s or \(Y\)s: Analyze separately or simultaneously?
4.5.1 Multiple \(X\) variables.
The same basic problems with multicollinearity applies to the Bayesian paradigm, too.
4.5.2 Estimation of a model with multiple \(X\) variables in PROCESS brms.
Hayes discussed the limitation that his PROCESS program may only handle a single \(X\) variable in the x=
part of the command line, for which he displayed a workaround. We don’t have such a limitation in brms. Using Hayes’s hypothetical data syntax for a model with three \(X\)s, the brms code would be:
y_model <- bf(dv ~ 1 + iv1 + iv2 + iv3 + med)
m_model <- bf(med ~ 1 + iv1 + iv2 + iv3)
model6 <-
brm(data = data, family = gaussian,
y_model + m_model + set_rescor(FALSE),
chains = 4, cores = 4)
To show it in action, let’s simulate some data.
N <- 1e3
set.seed(4.5)
d <-
tibble(iv1 = rnorm(N, mean = 0, sd = 1),
iv2 = rnorm(N, mean = 0, sd = 1),
iv3 = rnorm(N, mean = 0, sd = 1),
med = rnorm(N, mean = 0 + iv1*-1 + iv2*0 + iv3*1, sd = 1),
dv = rnorm(N, mean = 0 + iv1*0 + iv2*.5 + iv3*1 + med*.5, sd = 1))
head(d)
## # A tibble: 6 x 5
## iv1 iv2 iv3 med dv
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.217 0.177 -1.39 -0.755 -1.77
## 2 -0.542 1.69 0.0513 0.721 0.402
## 3 0.891 -1.35 1.10 0.777 -0.132
## 4 0.596 1.08 -0.203 -0.955 1.02
## 5 1.64 -0.456 -0.428 -2.89 -3.26
## 6 0.689 -0.681 -0.429 -0.462 -2.38
Before we proceed, if data simulation is new to you, you might check out Roger Peng’s helpful tutorial on the subject.
Here are the sub-models.
y_model <- bf(dv ~ 1 + iv1 + iv2 + iv3 + med)
m_model <- bf(med ~ 1 + iv1 + iv2 + iv3)
Now we fit.
model7 <-
brm(data = d, family = gaussian,
y_model + m_model + set_rescor(FALSE),
chains = 4, cores = 4)
And the results:
print(model7)
## Family: MV(gaussian, gaussian)
## Links: mu = identity; sigma = identity
## mu = identity; sigma = identity
## Formula: dv ~ 1 + iv1 + iv2 + iv3 + med
## med ~ 1 + iv1 + iv2 + iv3
## Data: d (Number of observations: 1000)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## dv_Intercept -0.01 0.03 -0.07 0.05 4000 1.00
## med_Intercept 0.00 0.03 -0.06 0.06 4000 1.00
## dv_iv1 0.02 0.04 -0.06 0.11 2731 1.00
## dv_iv2 0.56 0.03 0.50 0.62 4000 1.00
## dv_iv3 1.01 0.05 0.92 1.10 2775 1.00
## dv_med 0.46 0.03 0.40 0.53 2362 1.00
## med_iv1 -0.93 0.03 -0.99 -0.87 4000 1.00
## med_iv2 0.03 0.03 -0.03 0.09 4000 1.00
## med_iv3 0.98 0.03 0.92 1.04 4000 1.00
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma_dv 1.00 0.02 0.96 1.05 4000 1.00
## sigma_med 0.97 0.02 0.93 1.02 4000 1.00
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
brms came through just fine. If you wanted to simulate data with a particular correlation structure for the iv
variables, you might use the mvnorm()
function from the MASS package, which you can learn more about here.
4.5.3 Multiple \(Y\) variables.
We’ve already been using the multivariate syntax in brms for our simple mediation models. Fitting a mediation model with multiple \(Y\) variables is a minor extension. Let’s simulate more data.
N <- 1e3
set.seed(4.5)
d <-
tibble(iv = rnorm(N, mean = 0, sd = 1),
med = rnorm(N, mean = 0 + iv*.5, sd = 1),
dv1 = rnorm(N, mean = 0 + iv*-1 + med*0, sd = 1),
dv2 = rnorm(N, mean = 0 + iv*0 + med*.5, sd = 1),
dv3 = rnorm(N, mean = 0 + iv*1 + med*1, sd = 1))
head(d)
## # A tibble: 6 x 5
## iv med dv1 dv2 dv3
## <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 0.217 0.285 -1.61 0.999 0.420
## 2 -0.542 1.42 0.594 0.836 0.0208
## 3 0.891 -0.902 0.206 0.120 -0.954
## 4 0.596 1.37 -0.799 0.530 3.13
## 5 1.64 0.362 -2.06 -0.643 0.840
## 6 0.689 -0.337 -1.12 0.487 -1.03
Here are the sub-models.
y_model_1 <- bf(dv1 ~ 1 + iv + med)
y_model_2 <- bf(dv2 ~ 1 + iv + med)
y_model_3 <- bf(dv3 ~ 1 + iv + med)
m_model <- bf(med ~ 1 + iv)
And we fit.
model8 <-
brm(data = d, family = gaussian,
y_model_1 + y_model_2 + y_model_3 + m_model + set_rescor(FALSE),
chains = 4, cores = 4)
print(model8)
## Family: MV(gaussian, gaussian, gaussian, gaussian)
## Links: mu = identity; sigma = identity
## mu = identity; sigma = identity
## mu = identity; sigma = identity
## mu = identity; sigma = identity
## Formula: dv1 ~ 1 + iv + med
## dv2 ~ 1 + iv + med
## dv3 ~ 1 + iv + med
## med ~ 1 + iv
## Data: d (Number of observations: 1000)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
## total post-warmup samples = 4000
##
## Population-Level Effects:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## dv1_Intercept 0.01 0.03 -0.05 0.08 4000 1.00
## dv2_Intercept 0.00 0.03 -0.06 0.06 4000 1.00
## dv3_Intercept -0.01 0.03 -0.08 0.05 4000 1.00
## med_Intercept 0.02 0.03 -0.04 0.09 4000 1.00
## dv1_iv -1.05 0.04 -1.12 -0.98 4000 1.00
## dv1_med 0.05 0.03 -0.01 0.11 4000 1.00
## dv2_iv 0.05 0.03 -0.01 0.12 4000 1.00
## dv2_med 0.53 0.03 0.47 0.59 4000 1.00
## dv3_iv 1.03 0.04 0.95 1.10 4000 1.00
## dv3_med 1.06 0.03 1.00 1.12 4000 1.00
## med_iv 0.53 0.03 0.46 0.59 4000 1.00
##
## Family Specific Parameters:
## Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma_dv1 0.98 0.02 0.94 1.02 4000 1.00
## sigma_dv2 0.97 0.02 0.93 1.02 4000 1.00
## sigma_dv3 1.00 0.02 0.96 1.05 4000 1.00
## sigma_med 1.00 0.02 0.96 1.04 4000 1.00
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
brms to the rescue once again!