14.4 Mediation
In mediation models (Baron and Kenny 1986), we want to examine if a direct effect from one variable to another is mediated by an intervening or mediator variable. If a total mediation exists, we assume that a variable \(X\) has an effect on a variable \(Y\) only because \(X\) influences a mediating variable \(Z\), which itself affects \(Y\).
Mediation is used in many fields, especially when we are interested in the mechanisms behind some relationship between two variables. However, it is important to mention that mediation models should always be based on a theoretically and logically sound rationale why the some variable \(Z\) is mediating the relationship between \(X\) and \(Y\).
Using meta-analytic SEM, we can synthesize evidence from original studies to determine if a proposed mediation is indeed backed by all available evidence. In the following, we will show you an example of how this can be done in R using the metaSEM
package.
14.4.1 Model Specification
For this example, let us assume we want to disentangle why there is a relationship between (low) psychological resilience (see Fletcher and Sarkar (2013) for a definition of this concept) and elevated levels of depressive symptoms. Based on the literature, we assume that there are two mediating variables: emotion regulation capabilities and dysfunctional coping styles. Both mediating variables are influenced by resilience, while low emotion regulation capabilities also influence dysfunctional coping. Both emotion regulation and coping style then influence the level of depressive symptoms a person experiences. One may represent the proposed model graphically like this (again using idiosyncratic notation to facilitate the model specification later on):
For this example, we ill use the fictitious dat.med
dataset, which was adapted from the Hunter83
dataset in metaSEM
. The data can be downloaded here. The dataset is again a list containing (1) 14 correlation matrices for resilience, emotion regulation, dysfunctional coping and depressive symptoms extracted from 14 independent studies and (2) the \(N\) of each study (see previous chapter for more details on the dataset structure required). Let us have a look at the matrix for the fictitious Devegvar et al. (1992) study:
dat.med$data$`Devegvar et al. (1992)`
## Resilience EmotReg Coping Depression
## Resilience NA NA NA NA
## EmotReg NA 1.00 0.72 0.05
## Coping NA 0.72 1.00 0.32
## Depression NA 0.05 0.32 1.00
We see that this study has some missings, because the variable Resilience was not assessed. To see the overall missing data pattern, we can use the pattern.na()
function.
pattern.na(dat.med$data)
## Resilience EmotReg Coping Depression
## Resilience 1 3 3 2
## EmotReg 3 2 4 3
## Coping 3 4 2 3
## Depression 2 3 3 1
We see that the correlation \(r_{EmotReg,Coping}\) has the most missings in our data, with four studies not providing data for it. Given that we have \(k=14\) studies overall, this amount of missing data may be acceptable. However, we have to check if the matrices are positive definite, because this is a requirement for the later processing steps. We can do this with the is.pd()
function.
is.pd(dat.med$data)
## Guttman et al. (2003) McCaffrey et al. (2002) Loescher et al. (1997)
## TRUE TRUE TRUE
## O'Malley et al. (1999) Hay et al. (1999) Twiraga et al. (2014)
## TRUE TRUE TRUE
## Wanzer et al. (1994) Arthur et al. (1991) Frondel et al. (1999)
## TRUE TRUE TRUE
## Mill et al. (2001) Ilan et al. (2002) Severence et al. (1996)
## TRUE TRUE TRUE
## Devegvar et al. (1992) Matloff et al. (2008)
## TRUE TRUE
We get TRUE
for all studies, so everything is fine and we can continue.
14.4.2 Stage 1
Now let us proceed to pooling the correlation matrices in the first step. For a more detailed description of this step, please refer to the previous subchapter. This time, we use a fixed-effects model.
med1 <- tssem1(dat.med$data,
dat.med$n,
method = "FEM")
summary(med1)
##
## Call:
## tssem1FEM(Cov = Cov, n = n, cor.analysis = cor.analysis, model.name = model.name,
## cluster = cluster, suppressWarnings = suppressWarnings, silent = silent,
## run = run)
##
## Coefficients:
## Estimate Std.Error z value Pr(>|z|)
## S[1,2] 0.510487 0.012702 40.188 < 0.00000000000000022 ***
## S[1,3] 0.427086 0.014082 30.329 < 0.00000000000000022 ***
## S[1,4] 0.207713 0.015931 13.038 < 0.00000000000000022 ***
## S[2,3] 0.522965 0.013111 39.888 < 0.00000000000000022 ***
## S[2,4] 0.284562 0.015769 18.046 < 0.00000000000000022 ***
## S[3,4] 0.243256 0.016266 14.954 < 0.00000000000000022 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Goodness-of-fit indices:
## Value
## Sample size 3975.0000
## Chi-square of target model 264.3980
## DF of target model 60.0000
## p value of target model 0.0000
## Chi-square of independence model 2777.2830
## DF of independence model 66.0000
## RMSEA 0.1096
## RMSEA lower 95% CI 0.0964
## RMSEA upper 95% CI 0.1234
## SRMR 0.0918
## TLI 0.9171
## CFI 0.9246
## AIC 144.3980
## BIC -232.8688
## OpenMx status1: 0 ("0" or "1": The optimization is considered fine.
## Other values may indicate problems.)
The optimization status is 0
, so the estimates are trustworthy. If you do not get a status of 0
or 1
, plug the model into the rerun()
function to try fitting it again.
14.4.3 Stage 2
Now that we have the pooled correlation matrix available in med1
, we can proceed by specifying our proposed mediation model. Again, we specify the \(\mathbf{A}\) and \(\mathbf{S}\) matrices. The \(\mathbf{F}\) is not specified, because all of the variables in are model are observed (i.e., there are no latent variables). We will omit the details behind the matrix specification here; for more details please refer to the first subchapter for the general structure of the matrices, and the last subchapter on how the matrices are specified in R.
\[~\]
\(\mathbf{A}\) Matrix
We use starting values of \(0.2\).
# Build matrix
A <- matrix(c(0 , 0 , 0 , 0,
"0.2*Res_EmR", 0 , 0 , 0,
"0.2*Res_Cop", "0.2*EmR_Cop", 0 , 0,
0 , "0.2*EmR_Dep", "0.2*Cop_Dep", 0),
ncol = 4, nrow=4, byrow=TRUE)
# Set column and row labels
dimnames(A)[[1]] <- dimnames(A)[[2]] <- c("Resilience", "EmotReg", "Coping", "Depression")
A
## Resilience EmotReg Coping Depression
## Resilience "0" "0" "0" "0"
## EmotReg "0.2*Res_EmR" "0" "0" "0"
## Coping "0.2*Res_Cop" "0.2*EmR_Cop" "0" "0"
## Depression "0" "0.2*EmR_Dep" "0.2*Cop_Dep" "0"
A <- as.mxMatrix(A)
\[~\]
\(\mathbf{S}\) Matrix
We use starting values of \(0.1\).
# Build matrix
S <- Diag(c(1, "0.1*ErrVarE", "0.1*ErrVarC", "0.1*ErrVarD"))
# Set column and row labels
dimnames(S)[[1]] <- dimnames(S)[[2]] <- c("Resilience", "EmotReg", "Coping", "Depression")
S
## Resilience EmotReg Coping Depression
## Resilience "1" "0" "0" "0"
## EmotReg "0" "0.1*ErrVarE" "0" "0"
## Coping "0" "0" "0.1*ErrVarC" "0"
## Depression "0" "0" "0" "0.1*ErrVarD"
S <- as.mxMatrix(S)
\[~\]
Model Fitting
We can now proceed to fitting the model. In a mediation model, we also want to estimate the indirect effect from resilience to depression, taking all mediation paths into account. To do this, we can simply add all the mediation paths together. In our model, this would look like this:
\[\beta_{indirect_{Res-Dep}} = (\beta_{Res-Cop} \times \beta_{Cop-Dep}) + (\beta_{Res-EmR} \times \beta_{EmR-Cop} \times \beta_{Cop-Dep}) + (\beta_{Res-EmR} \times \beta_{Emr-Dep}) \]
We can define this function in our model so that it provides us with 95% confidence intervals around the indirect effect if we use likelihood-based intervals. We can define this function as a list
containing an mxAlgebra
object in R. Here is the code, using the labels we defined in the \(\mathbf{A}\) matrix above:
list(indirectEffect = mxAlgebra(Res_Cop*Cop_Dep + Res_EmR*EmR_Cop*Cop_Dep + Res_EmR*EmR_Dep,
name="indirectEffect"))
We can use this code as the argument for the mx.algebra
parameter in our call to tssem2()
. Because this is a mediation model, we also have to specify diag.constraints = TRUE
. Here is the code:
med2 <- tssem2(med1,
Amatrix = A,
Smatrix = S,
intervals.type = "LB",
diag.constraints = TRUE,
mx.algebras = list(indirectEffect = mxAlgebra(Res_Cop*Cop_Dep +
Res_EmR*EmR_Cop*Cop_Dep +
Res_EmR*EmR_Dep,
name="indirectEffect")))
# Rerun
med2 <- rerun(med2)
summary(med2)
Call:
wls(Cov = coef.tssem1FEM(tssem1.obj), aCov = vcov.tssem1FEM(tssem1.obj),
n = sum(tssem1.obj$n), Amatrix = Amatrix, Smatrix = Smatrix,
Fmatrix = Fmatrix, diag.constraints = diag.constraints, cor.analysis = tssem1.obj$cor.analysis,
intervals.type = intervals.type, mx.algebras = mx.algebras,
model.name = model.name, suppressWarnings = suppressWarnings,
silent = silent, run = run)
95% confidence intervals: Likelihood-based statistic
Coefficients:
Estimate Std.Error lbound ubound z value Pr(>|z|)
EmR_Cop 0.411161 NA 0.377782 0.444676 NA NA
Res_Cop 0.217913 NA 0.183661 0.252118 NA NA
Cop_Dep 0.131068 NA 0.091942 0.170226 NA NA
EmR_Dep 0.218838 NA 0.180424 0.257309 NA NA
Res_EmR 0.513365 NA 0.488406 0.538309 NA NA
ErrVarC 0.691468 NA 0.664472 0.717400 NA NA
ErrVarD 0.904928 NA 0.885399 0.922669 NA NA
ErrVarE 0.736457 NA 0.710223 0.761460 NA NA
mxAlgebras objects (and their 95% likelihood-based CIs):
lbound Estimate ubound
indirectEffect[1,1] 0.1498335 0.1685701 0.1878938
Goodness-of-fit indices:
Value
Sample size 3975.0000
Chi-square of target model 9.4087
DF of target model 1.0000
p value of target model 0.0022
Number of constraints imposed on "Smatrix" 3.0000
DF manually adjusted 0.0000
Chi-square of independence model 2697.6496
DF of independence model 6.0000
RMSEA 0.0460
RMSEA lower 95% CI 0.0226
RMSEA upper 95% CI 0.0748
SRMR 0.0161
TLI 0.9813
CFI 0.9969
AIC 7.4087
BIC 1.1209
OpenMx status1: 0 ("0" or "1": The optimization is considered fine.
Other values indicate problems.)
Because we told the tssem2()
function to use likelihood-based intervals, the Wald-type \(p\)-values are not displayed. We see that the proposed model fits the data closely, with \(\chi^{2}_{1,3975} = 9.4, p=0.002\)) and the \(RMSEA = 0.046\) being smaller than \(0.05\). Please note however, that we used the fixed-effect model in stage 1 to pool the correlations, which may not be appropriate if the between-study heterogeneity is substantial. The estimate of the indirect effect from resilience to depression assuming our moderation model is \(0.17\), which is significant (\(95\%CI: 0.15-0.19\)).
14.4.4 Plotting the Model
Again, we can plot our model using the semPaths()
function in the semPlot
package.
# Convert to semPlot
sem.plot <- meta2semPlot(med2)
# Create Labels (left to right, bottom to top)
labels <- c("Resilience","Emotion\nRegulation","Coping","Depres-\nsion")
# Plot
semPaths(sem.plot,
whatLabels = "est",
edge.color = "black",
layout="tree2",
rotation=2,
nodeLabels = labels)
References
Baron, Reuben M, and David A Kenny. 1986. “The Moderator–Mediator Variable Distinction in Social Psychological Research: Conceptual, Strategic, and Statistical Considerations.” Journal of Personality and Social Psychology 51 (6). American Psychological Association: 1173.
Fletcher, David, and Mustafa Sarkar. 2013. “Psychological Resilience: A Review and Critique of Definitions, Concepts, and Theory.” European Psychologist 18 (1). Hogrefe Publishing: 12.