22.3 Selection Problem
A fundamental challenge in causal inference is that we never observe both potential outcomes for the same individual—only one or the other. This creates the selection problem, which we formalize below.
Assume we have:
- A binary treatment variable:
Di∈{0,1}, where:- Di=1 indicates that individual i receives the treatment.
- Di=0 indicates that individual i does not receive the treatment.
- The outcome of interest:
Yi, which depends on whether the individual is treated or not:- Y0i: The outcome if not treated.
- Y1i: The outcome if treated.
Thus, the potential outcomes framework is defined as:
Potential Outcome={Y1i,if Di=1(Treated)Y0i,if Di=0(Untreated)
However, we only observe one outcome per individual:
Yi=Y0i+(Y1i−Y0i)Di
This means that for any given person, we either observe Y1i or Y0i, but never both. Since we cannot observe counterfactuals (unless we invent a time machine), we must rely on statistical inference to estimate treatment effects.
22.3.1 The Observed Difference in Outcomes
The goal is to estimate the difference in expected outcomes between treated and untreated individuals:
E[Yi|Di=1]−E[Yi|Di=0]
Expanding this equation:
E[Yi|Di=1]−E[Yi|Di=0]=(E[Y1i|Di=1]−E[Y0i|Di=1])+(E[Y0i|Di=1]−E[Y0i|Di=0])=(E[Y1i−Y0i|Di=1])+(E[Y0i|Di=1]−E[Y0i|Di=0])
This equation decomposes the observed difference into two components:
Treatment Effect on the Treated: E[Y1i−Y0i|Di=1], which represents the causal impact of the treatment on those who are treated.
Selection Bias:
E[Y0i|Di=1]−E[Y0i|Di=0], which captures systematic differences between treated and untreated groups even in the absence of treatment.
Thus, the observed difference in outcomes is:
Observed Difference=ATT+Selection Bias
22.3.2 Eliminating Selection Bias with Random Assignment
With random assignment of treatment, Di is independent of potential outcomes:
E[Yi|Di=1]−E[Yi|Di=0]=E[Y1i−Y0i]
This works because, under true randomization:
E[Y0i|Di=1]=E[Y0i|Di=0]
which eliminates selection bias. Consequently, the observed difference now directly estimates the true causal effect:
E[Yi|Di=1]−E[Yi|Di=0]=E[Y1i−Y0i]
Thus, randomized controlled trials provide an unbiased estimate of the average treatment effect.
22.3.3 Another Representation Under Regression
So far, we have framed the selection problem using expectations and potential outcomes. Another way to represent treatment effects is through regression models, which provide a practical framework for estimation.
Suppose the treatment effect is constant across individuals:
Y1i−Y0i=ρ
This implies that each treated individual experiences the same treatment effect (ρ), though their baseline outcomes (Y0i) may vary.
Since we only observe one of the potential outcomes, the observed outcome can be expressed as:
Yi=E(Y0i)+(Y1i−Y0i)Di+[Y0i−E(Y0i)]=α+ρDi+ηi
where:
- α=E(Y0i), the expected outcome for untreated individuals.
- ρ represents the causal treatment effect.
- ηi=Y0i−E(Y0i), capturing individual deviations from the mean untreated outcome.
Thus, the regression model provides an intuitive way to express treatment effects.
22.3.3.1 Conditional Expectations and Selection Bias
Taking expectations conditional on treatment status:
E[Yi|Di=1]=α+ρ+E[ηi|Di=1]E[Yi|Di=0]=α+E[ηi|Di=0]
The observed difference in means between treated and untreated groups is:
E[Yi|Di=1]−E[Yi|Di=0]=ρ+E[ηi|Di=1]−E[ηi|Di=0]
Here, the term E[ηi|Di=1]−E[ηi|Di=0] represents selection bias—the correlation between the regression error term (ηi) and the treatment variable (Di).
Under random assignment, we assume that potential outcomes are independent of treatment (Di):
E[ηi|Di=1]−E[ηi|Di=0]=E[Y0i|Di=1]−E[Y0i|Di=0]=0
Thus, under true randomization, selection bias disappears, and the observed difference directly estimates the causal effect ρ.
22.3.3.2 Controlling for Additional Variables
In many real-world scenarios, random assignment is imperfect, and selection bias may still exist. To mitigate this, we introduce control variables (Xi), such as demographic characteristics, firm size, or prior purchasing behavior.
If Xi is uncorrelated with treatment (Di), including it in our regression model does not bias the estimate of ρ but has two advantages:
- It reduces residual variance (ηi), improving the precision of ρ.
- It accounts for additional sources of variability, making the model more robust.
Thus, our regression model extends to:
Yi=α+ρDi+X′iγ+ηi
where:
- Xi represents a vector of control variables.
- γ captures the effect of Xi on the outcome.
22.3.3.3 Example: Racial Discrimination in Hiring
A famous study by Bertrand and Mullainathan (2004) examined racial discrimination in hiring by randomly assigning Black- and White-sounding names to identical job applications. By ensuring that names were assigned randomly, the authors eliminated confounding factors like education and experience, allowing them to estimate the causal effect of race on callback rates.