34.2 Causal Inference Approach

34.2.1 Example 1

from Virginia’s library

myData <-
    read.csv('http://static.lib.virginia.edu/statlab/materials/data/mediationData.csv')

# Step 1 (no longer necessary)
model.0 <- lm(Y ~ X, myData)
summary(model.0)
#> 
#> Call:
#> lm(formula = Y ~ X, data = myData)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -5.0262 -1.2340 -0.3282  1.5583  5.1622 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   2.8572     0.6932   4.122 7.88e-05 ***
#> X             0.3961     0.1112   3.564 0.000567 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.929 on 98 degrees of freedom
#> Multiple R-squared:  0.1147, Adjusted R-squared:  0.1057 
#> F-statistic:  12.7 on 1 and 98 DF,  p-value: 0.0005671

# Step 2
model.M <- lm(M ~ X, myData)
summary(model.M)
#> 
#> Call:
#> lm(formula = M ~ X, data = myData)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -4.3046 -0.8656  0.1344  1.1344  4.6954 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)  1.49952    0.58920   2.545   0.0125 *  
#> X            0.56102    0.09448   5.938 4.39e-08 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.639 on 98 degrees of freedom
#> Multiple R-squared:  0.2646, Adjusted R-squared:  0.2571 
#> F-statistic: 35.26 on 1 and 98 DF,  p-value: 4.391e-08

# Step 3
model.Y <- lm(Y ~ X + M, myData)
summary(model.Y)
#> 
#> Call:
#> lm(formula = Y ~ X + M, data = myData)
#> 
#> Residuals:
#>     Min      1Q  Median      3Q     Max 
#> -3.7631 -1.2393  0.0308  1.0832  4.0055 
#> 
#> Coefficients:
#>             Estimate Std. Error t value Pr(>|t|)    
#> (Intercept)   1.9043     0.6055   3.145   0.0022 ** 
#> X             0.0396     0.1096   0.361   0.7187    
#> M             0.6355     0.1005   6.321 7.92e-09 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 1.631 on 97 degrees of freedom
#> Multiple R-squared:  0.373,  Adjusted R-squared:  0.3601 
#> F-statistic: 28.85 on 2 and 97 DF,  p-value: 1.471e-10

# Step 4 (boostrapping)
library(mediation)
results <- mediate(
    model.M,
    model.Y,
    treat = 'X',
    mediator = 'M',
    boot = TRUE,
    sims = 500
)
summary(results)
#> 
#> Causal Mediation Analysis 
#> 
#> Nonparametric Bootstrap Confidence Intervals with the Percentile Method
#> 
#>                Estimate 95% CI Lower 95% CI Upper p-value    
#> ACME             0.3565       0.2315         0.51  <2e-16 ***
#> ADE              0.0396      -0.1817         0.28     0.8    
#> Total Effect     0.3961       0.1821         0.64  <2e-16 ***
#> Prop. Mediated   0.9000       0.5226         1.83  <2e-16 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Sample Size Used: 100 
#> 
#> 
#> Simulations: 500
  • Total Effect = 0.3961 = \(b_1\) (step 1) = total effect of \(X\) on \(Y\) without \(M\)

  • Direct Effect = ADE = 0.0396 = \(b_4\) (step 3) = direct effect of \(X\) on \(Y\) accounting for the indirect effect of \(M\)

  • ACME = Average Causal Mediation Effects = \(b_1 - b_4\) = 0.3961 - 0.0396 = 0.3565 = \(b_2 \times b_3\) = 0.56102 * 0.6355 = 0.3565

Using mediation package suggested by Imai, Keele, and Yamamoto (2010). More details of the package can be found here

2 types of Inference in this package:

  1. Model-based inference:

    • Assumptions:

      • Treatment is randomized (could use matching methods to achieve this).

      • Sequential Ignorability: conditional on covariates, there is other confounders that affect the relationship between (1) treatment-mediator, (2) treatment-outcome, (3) mediator-outcome. Typically hard to argue in observational data. This assumption is for the identification of ACME (i.e., average causal mediation effects).

  2. Design-based inference

Notations: we stay consistent with package instruction

  • \(M_i(t)\) = mediator

  • \(T_i\) = treatment status \((0,1)\)

  • \(Y_i(t,m)\) = outcome where \(t\) = treatment, and \(m\) = mediating variables.

  • \(X_i\) = vector of observed pre-treatment confounders

  • Treatment effect (per unit \(i\)) = \(\tau_i = Y_i(1,M_i(1)) - Y_i (0,M_i(0))\) which has 2 effects

    • Causal mediation effects: \(\delta_i (t) \equiv Y_i (t,M_i(1)) - Y_i(t,M_i(0))\)

    • Direct effects: \(\zeta (t) \equiv Y_i (1, M_i(1)) - Y_i(0, M_i(0))\)

    • summing up to the treatment effect: \(\tau_i = \delta_i (t) + \zeta_i (1-t)\)

More on sequential ignorability

\[ \{ Y_i (t', m) , M_i (t) \} \perp T_i |X_i = x \]

\[ Y_i(t',m) \perp M_i(t) | T_i = t, X_i = x \]

where

  • \(0 < P(T_i = t | X_i = x)\)

  • \(0 < P(M_i = m | T_i = t , X_i =x)\)

First condition is the standard strong ignorability condition where treatment assignment is random conditional on pre-treatment confounders.

Second condition is stronger where the mediators is also random given the observed treatment and pre-treatment confounders. This condition is satisfied only when there is no unobserved pre-treatment confounders, and post-treatment confounders, and multiple mediators that are correlated.

My understanding is that until the moment I write this note, there is no way to test the sequential ignorability assumption. Hence, researchers can only do sensitivity analysis to argue for their result.

References

Imai, Kosuke, Luke Keele, and Teppei Yamamoto. 2010. “Identification, Inference and Sensitivity Analysis for Causal Mediation Effects.”