29.2 Simple Dif-n-dif
A tool developed intuitively to study “natural experiment”, but its uses are much broader.
Fixed Effects Estimator is the foundation for DID
Why is dif-in-dif attractive? Identification strategy: Inter-temporal variation between groups
Cross-sectional estimator helps avoid omitted (unobserved) common trends
Time-series estimator helps overcome omitted (unobserved) cross-sectional differences
Consider
Di=1 treatment group
Di=0 control group
T=1 After the treatment
T=0 Before the treatment
After (T = 1) | Before (T = 0) | |
---|---|---|
Treated Di=1 | E[Y1i(1)|Di=1] | E[Y0i(0)|D)i=1] |
Control Di=0 | E[Y0i(1)|Di=0] | E[Y0i(0)|Di=0] |
missing E[Y0i(1)|D=1]
The Average Treatment Effect on Treated (ATT)
E[Y1(1)−Y0(1)|D=1]={E[Y(1)|D=1]−E[Y(1)|D=0]}−{E[Y(0)|D=1]−E[Y(0)|D=0]}
More elaboration:
- For the treatment group, we isolate the difference between being treated and not being treated. If the untreated group would have been affected in a different way, the DiD design and estimate would tell us nothing.
- Alternatively, because we can’t observe treatment variation in the control group, we can’t say anything about the treatment effect on this group.
Extension
- More than 2 groups (multiple treatments and multiple controls), and more than 2 period (pre and post)
Yigt=αg+γt+βIgt+δXigt+ϵigt
where
αg is the group-specific fixed effect
γt = time specific fixed effect
β = dif-in-dif effect
Igt = interaction terms (n treatment indicators x n post-treatment dummies) (capture effect heterogeneity over time)
This specification is the “two-way fixed effects DiD” - TWFE (i.e., 2 sets of fixed effects: group + time).
- However, if you have Staggered Dif-n-dif (i.e., treatment is applied at different times to different groups). TWFE is really bad.
- Long-term Effects
To examine the dynamic treatment effects (that are not under rollout/staggered design), we can create a centered time variable,
Centered Time Variable | Period |
---|---|
… | |
t=−1 | 2 periods before treatment period |
t=0 | Last period right before treatment period Remember to use this period as reference group |
t=1 | Treatment period |
… |
By interacting this factor variable, we can examine the dynamic effect of treatment (i.e., whether it’s fading or intensifying)
Y=α0+α1Group+α2Time+β−T1Treatment+β−(T1−1)Treatment+⋯+β−1Treatment+β1+⋯+βT2Treatment
where
β0 is used as the reference group (i.e., drop from the model)
T1 is the pre-treatment period
T2 is the post-treatment period
With more variables (i.e., interaction terms), coefficients estimates can be less precise (i.e., higher SE).
- DiD on the relationship, not levels. Technically, we can apply DiD research design not only on variables, but also on coefficients estimates of some other regression models with before and after a policy is implemented.
Goal:
- Pre-treatment coefficients should be non-significant β−T1,…,β−1=0 (similar to the Placebo Test)
- Post-treatment coefficients are expected to be significant β1,…,βT2≠0
- You can now examine the trend in post-treatment coefficients (i.e., increasing or decreasing)
library(tidyverse)
library(fixest)
od <- causaldata::organ_donations %>%
# Treatment variable
dplyr::mutate(California = State == 'California') %>%
# centered time variable
dplyr::mutate(center_time = as.factor(Quarter_Num - 3))
# where 3 is the reference period precedes the treatment period
class(od$California)
#> [1] "logical"
class(od$State)
#> [1] "character"
cali <- feols(Rate ~ i(center_time, California, ref = 0) |
State + center_time,
data = od)
etable(cali)
#> cali
#> Dependent Var.: Rate
#>
#> California x center_time = -2 -0.0029 (0.0051)
#> California x center_time = -1 0.0063** (0.0023)
#> California x center_time = 1 -0.0216*** (0.0050)
#> California x center_time = 2 -0.0203*** (0.0045)
#> California x center_time = 3 -0.0222* (0.0100)
#> Fixed-Effects: -------------------
#> State Yes
#> center_time Yes
#> _____________________________ ___________________
#> S.E.: Clustered by: State
#> Observations 162
#> R2 0.97934
#> Within R2 0.00979
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
iplot(cali, pt.join = T)