42.2 Causal Inference Approach

42.2.1 Example 1

myData <-
    read.csv('http://static.lib.virginia.edu/statlab/materials/data/mediationData.csv')

# Step 1 (no longer necessary)
model.0 <- lm(Y ~ X, myData)
summary(model.0)
#> 
#> Call:
#> lm(formula = Y ~ X, data = myData)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -5.0262 -1.2340 -0.3282  1.5583  5.1622 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   2.8572     0.6932   4.122 7.88e-05 ***
#> X             0.3961     0.1112   3.564 0.000567 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.929 on 98 degrees of freedom
#> Multiple R-squared:  0.1147, Adjusted R-squared:  0.1057 
#> F-statistic:  12.7 on 1 and 98 DF,  p-value: 0.0005671

# Step 2
model.M <- lm(M ~ X, myData)
summary(model.M)
#> 
#> Call:
#> lm(formula = M ~ X, data = myData)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.3046 -0.8656  0.1344  1.1344  4.6954 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  1.49952    0.58920   2.545   0.0125 *  
#> X            0.56102    0.09448   5.938 4.39e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.639 on 98 degrees of freedom
#> Multiple R-squared:  0.2646, Adjusted R-squared:  0.2571 
#> F-statistic: 35.26 on 1 and 98 DF,  p-value: 4.391e-08

# Step 3
model.Y <- lm(Y ~ X + M, myData)
summary(model.Y)
#> 
#> Call:
#> lm(formula = Y ~ X + M, data = myData)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.7631 -1.2393  0.0308  1.0832  4.0055 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   1.9043     0.6055   3.145   0.0022 ** 
#> X             0.0396     0.1096   0.361   0.7187    
#> M             0.6355     0.1005   6.321 7.92e-09 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.631 on 97 degrees of freedom
#> Multiple R-squared:  0.373,  Adjusted R-squared:  0.3601 
#> F-statistic: 28.85 on 2 and 97 DF,  p-value: 1.471e-10

# Step 4 (boostrapping)
library(mediation)
results <- mediate(
    model.M,
    model.Y,
    treat = 'X',
    mediator = 'M',
    boot = TRUE,
    sims = 500
)
summary(results)
#> 
#> Causal Mediation Analysis 
#> 
#> Nonparametric Bootstrap Confidence Intervals with the Percentile Method
#> 
#>                Estimate 95% CI Lower 95% CI Upper p-value    
#> ACME             0.3565       0.2315         0.51  <2e-16 ***
#> ADE              0.0396      -0.1817         0.28     0.8    
#> Total Effect     0.3961       0.1821         0.64  <2e-16 ***
#> Prop. Mediated   0.9000       0.5226         1.83  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Sample Size Used: 100 
#> 
#> 
#> Simulations: 500

Total Effect = 0.3961 = \(b_1\) (step 1) = total effect of \(X\) on \(Y\) without \(M\)
Direct Effect = ADE = 0.0396 = \(b_4\) (step 3) = direct effect of \(X\) on \(Y\) accounting for the indirect effect of \(M\)
ACME = Average Causal Mediation Effects = \(b_1 - b_4\) = 0.3961 - 0.0396 = 0.3565 = \(b_2 \times b_3\) = 0.56102 * 0.6355 = 0.3565

Using mediation package suggested by Imai, Keele, and Yamamoto (2010). More details of the package can be found here

2 types of Inference in this package:

Model-based inference:
- Assumptions:
  - Treatment is randomized (could use matching methods to achieve this).
  - Sequential Ignorability: conditional on covariates, there is other confounders that affect the relationship between (1) treatment-mediator, (2) treatment-outcome, (3) mediator-outcome. Typically hard to argue in observational data. This assumption is for the identification of ACME (i.e., average causal mediation effects).
Design-based inference

Notations: we stay consistent with package instruction

\(M_i(t)\) = mediator
\(T_i\) = treatment status \((0,1)\)
\(Y_i(t,m)\) = outcome where \(t\) = treatment, and \(m\) = mediating variables.
\(X_i\) = vector of observed pre-treatment confounders
Treatment effect (per unit \(i\)) = \(\tau_i = Y_i(1,M_i(1)) - Y_i (0,M_i(0))\) which has 2 effects
- Causal mediation effects: \(\delta_i (t) \equiv Y_i (t,M_i(1)) - Y_i(t,M_i(0))\)
- Direct effects: \(\zeta (t) \equiv Y_i (1, M_i(1)) - Y_i(0, M_i(0))\)
- summing up to the treatment effect: \(\tau_i = \delta_i (t) + \zeta_i (1-t)\)

References

Imai, Kosuke, Luke Keele, and Teppei Yamamoto. 2010. “Identification, Inference and Sensitivity Analysis for Causal Mediation Effects.”