## 4.5 Multiple $$X$$s or $$Y$$s: Analyze separately or simultaneously?

### 4.5.1 Multiple $$X$$ variables.

The same basic problems with multicollinearity applies to the Bayesian paradigm, too.

### 4.5.2 Estimation of a model with multiple $$X$$ variables in PROCESS brms.

Hayes discussed the limitation that his PROCESS program may only handle a single $$X$$ variable in the x= part of the command line, for which he displayed a workaround. We don’t have such a limitation in brms. Using Hayes’s hypothetical data syntax for a model with three $$X$$s, the brms code would be:

y_model <- bf(dv ~ 1 + iv1 + iv2 + iv3 + med)
m_model <- bf(med ~ 1 + iv1 + iv2 + iv3)

model6 <-
brm(data = data, family = gaussian,
y_model + m_model + set_rescor(FALSE),
chains = 4, cores = 4)

To show it in action, let’s simulate some data.

N <- 1e3

set.seed(4.5)
d <-
tibble(iv1 = rnorm(N, mean = 0, sd = 1),
iv2 = rnorm(N, mean = 0, sd = 1),
iv3 = rnorm(N, mean = 0, sd = 1),
med = rnorm(N, mean = 0 + iv1*-1 + iv2*0 + iv3*1, sd = 1),
dv  = rnorm(N, mean = 0 + iv1*0 + iv2*.5 + iv3*1 + med*.5, sd = 1))

head(d)
## # A tibble: 6 x 5
##      iv1    iv2     iv3    med     dv
##    <dbl>  <dbl>   <dbl>  <dbl>  <dbl>
## 1  0.217  0.177 -1.39   -0.755 -1.77
## 2 -0.542  1.69   0.0513  0.721  0.402
## 3  0.891 -1.35   1.10    0.777 -0.132
## 4  0.596  1.08  -0.203  -0.955  1.02
## 5  1.64  -0.456 -0.428  -2.89  -3.26
## 6  0.689 -0.681 -0.429  -0.462 -2.38

Before we proceed, if data simulation is new to you, you might check out Roger Peng’s helpful tutorial on the subject.

Here are the sub-models.

y_model <- bf(dv ~ 1 + iv1 + iv2 + iv3 + med)
m_model <- bf(med ~ 1 + iv1 + iv2 + iv3)

Now we fit.

model7 <-
brm(data = d, family = gaussian,
y_model + m_model + set_rescor(FALSE),
chains = 4, cores = 4)

And the results:

print(model7)
##  Family: MV(gaussian, gaussian)
##   Links: mu = identity; sigma = identity
##          mu = identity; sigma = identity
## Formula: dv ~ 1 + iv1 + iv2 + iv3 + med
##          med ~ 1 + iv1 + iv2 + iv3
##    Data: d (Number of observations: 1000)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
##
## Population-Level Effects:
##               Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## dv_Intercept     -0.01      0.03    -0.07     0.05       4000 1.00
## med_Intercept     0.00      0.03    -0.06     0.06       4000 1.00
## dv_iv1            0.02      0.04    -0.06     0.11       2731 1.00
## dv_iv2            0.56      0.03     0.50     0.62       4000 1.00
## dv_iv3            1.01      0.05     0.92     1.10       2775 1.00
## dv_med            0.46      0.03     0.40     0.53       2362 1.00
## med_iv1          -0.93      0.03    -0.99    -0.87       4000 1.00
## med_iv2           0.03      0.03    -0.03     0.09       4000 1.00
## med_iv3           0.98      0.03     0.92     1.04       4000 1.00
##
## Family Specific Parameters:
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma_dv      1.00      0.02     0.96     1.05       4000 1.00
## sigma_med     0.97      0.02     0.93     1.02       4000 1.00
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

brms came through just fine. If you wanted to simulate data with a particular correlation structure for the iv variables, you might use the mvnorm() function from the MASS package, which you can learn more about here.

### 4.5.3 Multiple $$Y$$ variables.

We’ve already been using the multivariate syntax in brms for our simple mediation models. Fitting a mediation model with multiple $$Y$$ variables is a minor extension. Let’s simulate more data.

N <- 1e3

set.seed(4.5)
d <-
tibble(iv  = rnorm(N, mean = 0, sd = 1),
med = rnorm(N, mean = 0 + iv*.5, sd = 1),
dv1 = rnorm(N, mean = 0 + iv*-1 + med*0,  sd = 1),
dv2 = rnorm(N, mean = 0 + iv*0  + med*.5, sd = 1),
dv3 = rnorm(N, mean = 0 + iv*1  + med*1,  sd = 1))

head(d)
## # A tibble: 6 x 5
##       iv    med    dv1    dv2     dv3
##    <dbl>  <dbl>  <dbl>  <dbl>   <dbl>
## 1  0.217  0.285 -1.61   0.999  0.420
## 2 -0.542  1.42   0.594  0.836  0.0208
## 3  0.891 -0.902  0.206  0.120 -0.954
## 4  0.596  1.37  -0.799  0.530  3.13
## 5  1.64   0.362 -2.06  -0.643  0.840
## 6  0.689 -0.337 -1.12   0.487 -1.03

Here are the sub-models.

y_model_1 <- bf(dv1 ~ 1 + iv + med)
y_model_2 <- bf(dv2 ~ 1 + iv + med)
y_model_3 <- bf(dv3 ~ 1 + iv + med)
m_model <- bf(med ~ 1 + iv)

And we fit.

model8 <-
brm(data = d, family = gaussian,
y_model_1 + y_model_2 + y_model_3 + m_model + set_rescor(FALSE),
chains = 4, cores = 4)
print(model8)
##  Family: MV(gaussian, gaussian, gaussian, gaussian)
##   Links: mu = identity; sigma = identity
##          mu = identity; sigma = identity
##          mu = identity; sigma = identity
##          mu = identity; sigma = identity
## Formula: dv1 ~ 1 + iv + med
##          dv2 ~ 1 + iv + med
##          dv3 ~ 1 + iv + med
##          med ~ 1 + iv
##    Data: d (Number of observations: 1000)
## Samples: 4 chains, each with iter = 2000; warmup = 1000; thin = 1;
##          total post-warmup samples = 4000
##
## Population-Level Effects:
##               Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## dv1_Intercept     0.01      0.03    -0.05     0.08       4000 1.00
## dv2_Intercept     0.00      0.03    -0.06     0.06       4000 1.00
## dv3_Intercept    -0.01      0.03    -0.08     0.05       4000 1.00
## med_Intercept     0.02      0.03    -0.04     0.09       4000 1.00
## dv1_iv           -1.05      0.04    -1.12    -0.98       4000 1.00
## dv1_med           0.05      0.03    -0.01     0.11       4000 1.00
## dv2_iv            0.05      0.03    -0.01     0.12       4000 1.00
## dv2_med           0.53      0.03     0.47     0.59       4000 1.00
## dv3_iv            1.03      0.04     0.95     1.10       4000 1.00
## dv3_med           1.06      0.03     1.00     1.12       4000 1.00
## med_iv            0.53      0.03     0.46     0.59       4000 1.00
##
## Family Specific Parameters:
##           Estimate Est.Error l-95% CI u-95% CI Eff.Sample Rhat
## sigma_dv1     0.98      0.02     0.94     1.02       4000 1.00
## sigma_dv2     0.97      0.02     0.93     1.02       4000 1.00
## sigma_dv3     1.00      0.02     0.96     1.05       4000 1.00
## sigma_med     1.00      0.02     0.96     1.04       4000 1.00
##
## Samples were drawn using sampling(NUTS). For each parameter, Eff.Sample
## is a crude measure of effective sample size, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

brms to the rescue once again!