34.2 Framework for Instrumental Variables

We consider a binary treatment framework where:

  • \(D_i \sim Bernoulli(p)\) is a dummy treatment variable.

  • \((Y_{0i}, Y_{1i})\) are the potential outcomes under control and treatment.

  • The observed outcome is: \[ Y_i = Y_{0i} + (Y_{1i} - Y_{0i}) D_i. \]

  • We introduce an instrumental variable \(Z_i\) satisfying: \[ Z_i \perp (Y_{0i}, Y_{1i}, D_{0i}, D_{1i}). \]

    • This means \(Z_i\) is independent of potential outcomes and potential treatment status.
    • \(Z_i\) must also be correlated with \(D_i\) to satisfy the relevance condition.

34.2.1 Constant-Treatment-Effect Model

Under the constant treatment effect assumption (i.e., the treatment effect is the same for all individuals),

\[ \begin{aligned} Y_{0i} &= \alpha + \eta_i, \\ Y_{1i} - Y_{0i} &= \rho, \\ Y_i &= Y_{0i} + D_i (Y_{1i} - Y_{0i}) \\ &= \alpha + \eta_i + D_i \rho \\ &= \alpha + \rho D_i + \eta_i. \end{aligned} \]

where:

  • \(\eta_i\) captures individual-level heterogeneity.
  • \(\rho\) is the constant treatment effect.

The problem with OLS estimation is that \(D_i\) may be correlated with \(\eta_i\), leading to endogeneity bias.

34.2.2 Instrumental Variable Solution

A valid instrument \(Z_i\) allows us to estimate the causal effect \(\rho\) via:

\[ \begin{aligned} \rho &= \frac{\text{Cov}(Y_i, Z_i)}{\text{Cov}(D_i, Z_i)} \\ &= \frac{\text{Cov}(Y_i, Z_i) / V(Z_i) }{\text{Cov}(D_i, Z_i) / V(Z_i)} \\ &= \frac{\text{Reduced form estimate}}{\text{First-stage estimate}} \\ &= \frac{E[Y_i |Z_i = 1] - E[Y_i | Z_i = 0]}{E[D_i |Z_i = 1] - E[D_i | Z_i = 0 ]}. \end{aligned} \]

This ratio measures the treatment effect only if \(Z_i\) is a valid instrument.

34.2.3 Heterogeneous Treatment Effects and the LATE Framework

In a more general framework where treatment effects vary across individuals,

  • Define potential outcomes as: \[ Y_i(d,z) = \text{outcome for unit } i \text{ given } D_i = d, Z_i = z. \]

  • Define treatment status based on \(Z_i\): \[ D_i = D_{0i} + Z_i (D_{1i} - D_{0i}). \]

    where:

    • \(D_{1i}\) is the treatment status when \(Z_i = 1\).
    • \(D_{0i}\) is the treatment status when \(Z_i = 0\).
    • \(D_{1i} - D_{0i}\) is the causal effect of \(Z_i\) on \(D_i\).

34.2.4 Assumptions for LATE Identification

34.2.4.1 Independence (Instrument Randomization)

The instrument must be as good as randomly assigned:

\[ [\{Y_i(d,z); \forall d, z \}, D_{1i}, D_{0i} ] \perp Z_i. \]

This ensures that \(Z_i\) is uncorrelated with potential outcomes and potential treatment status.

This assumption let the first-stage equation be the average causal effect of \(Z_i\) on \(D_i\)

\[ \begin{aligned} E[D_i |Z_i = 1] - E[D_i | Z_i = 0] &= E[D_{1i} |Z_i = 1] - E[D_{0i} |Z_i = 0] \\ &= E[D_{1i} - D_{0i}] \end{aligned} \]

This assumption also is sufficient for a causal interpretation of the reduced form, where we see the effect of the instrument \(Z_i\) on the outcome \(Y_i\):

\[ E[Y_i |Z_i = 1 ] - E[Y_i|Z_i = 0] = E[Y_i (D_{1i}, Z_i = 1) - Y_i (D_{0i} , Z_i = 0)] \]

34.2.4.2 Exclusion Restriction

This is also known as the existence of the instrument assumption (G. W. Imbens and Angrist 1994). The instrument should only affect \(Y_i\) through \(D_i\) (i.e., the treatment \(D_i\) fully mediates the effect of \(Z_i\) on \(Y_i\)):

\[ \begin{aligned} Y_{1i} &= Y_i (1,1) = Y_i (1,0)\\ Y_{0i} &= Y_i (0,1) = Y_i (0,0) \end{aligned} \]

Under this assumption (and assume \(Y_{1i, Y_{0i}}\) already satisfy the independence assumption), the observed outcome \(Y_i\) can be rewritten as:

\[ \begin{aligned} Y_i &= Y_i (0, Z_i) + [Y_i (1 , Z_i) - Y_i (0, Z_i)] D_i \\ &= Y_{0i} + (Y_{1i} - Y_{0i}) D_i. \end{aligned} \]

This assumption let us go from reduced-form causal effects to treatment effects (J. D. Angrist and Imbens 1995).

34.2.4.3 Monotonicity (No Defiers)

We assume that \(Z_i\) affects \(D_i\) in a monotonic way:

\[ D_{1i} \geq D_{0i}, \quad \forall i. \]

  • This assumption lets us assume that there is a first stage, in which we examine the proportion of the population that \(D_i\) is driven by \(Z_i\). It implies that \(Z_i\) only moves individuals toward treatment, but never away. This rules out “defiers” (i.e., individuals who would have taken the treatment when not assigned but refuse when assigned).
  • This assumption is used to solve to problem of the shifts between participation status back to non-participation status.
    • Alternatively, one can solve the same problem by assuming constant (homogeneous) treatment effect (G. W. Imbens and Angrist 1994), but this is rather restrictive.

    • A third solution is the assumption that there exists a value of the instrument, where the probability of participation conditional on that value is 0 J. Angrist and Imbens (1991).

Under monotonicity,

\[ \begin{aligned} E[D_{1i} - D_{0i} ] = P[D_{1i} > D_{0i}]. \end{aligned} \]

34.2.5 Local Average Treatment Effect Theorem

Given Independence, Exclusion, and Monotonicity, we obtain the LATE result (J. D. Angrist and Pischke 2009, 4.4.1):

\[ \begin{aligned} \frac{E[Y_i | Z_i = 1] - E[Y_i | Z_i = 0]}{E[D_i |Z_i = 1] - E[D_i |Z_i = 0]} = E[Y_{1i} - Y_{0i} | D_{1i} > D_{0i}]. \end{aligned} \]

This states that the IV estimator recovers the causal effect only for compliers—units whose treatment status changes due to \(Z_i\).

IV only identifies treatment effects for switchers (compliers):

Switcher Type Compliance Type Definition
Switchers Compliers \(D_{1i} > D_{0i}\) (take treatment if \(Z_i = 1\), not if \(Z_i = 0\))
Non-switchers Always-Takers \(D_{1i} = D_{0i} = 1\) (always take treatment)
Non-switchers Never-Takers \(D_{1i} = D_{0i} = 0\) (never take treatment)
  • IV estimates nothing for always-takers and never-takers since their treatment status is unaffected by \(Z_i\) (Similar to the fixed-effects models).

34.2.6 IV in Randomized Trials (Noncompliance)

  • In randomized trials, if compliance is imperfect (i.e., compliance is voluntary), where individuals in the treatment group will not always take the treatment (e.g., selection bias), intention-to-treat (ITT) estimates are valid but contaminated by noncompliance.
  • IV estimation using random assignment (\(Z_i\)) as an instrument for actual treatment received (\(D_i\)) recovers the LATE.

\[ \begin{aligned} \frac{E[Y_i |Z_i = 1] - E[Y_i |Z_i = 0]}{E[D_i |Z_i = 1]} = \frac{\text{Intent-to-Treat Effect}}{\text{Compliance Rate}} = E[Y_{1i} - Y_{0i} |D_i = 1]. \end{aligned} \]

Under full compliance, LATE = Treatment Effect on the Treated (TOT).


References

Angrist, Joshua D, and Guido W Imbens. 1995. “Two-Stage Least Squares Estimation of Average Causal Effects in Models with Variable Treatment Intensity.” Journal of the American Statistical Association 90 (430): 431–42.
Angrist, Joshua D, and Jörn-Steffen Pischke. 2009. Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton university press.
Angrist, Joshua, and Guido Imbens. 1991. “Sources of Identifying Information in Evaluation Models.” National Bureau of Economic Research Cambridge, Mass., USA.
Imbens, Guido W, and Joshua D Angrist. 1994. “Identification and Estimation of Local Average Treatment Effects.” Econometrica 62 (2): 467–75.