26.2 Simple Dif-n-dif
A tool developed intuitively to study “natural experiment”, but its uses are much broader.
Fixed Effects Estimator is the foundation for DID
Why is dif-in-dif attractive? Identification strategy: Inter-temporal variation between groups
Cross-sectional estimator helps avoid omitted (unobserved) common trends
Time-series estimator helps overcome omitted (unobserved) cross-sectional differences
Consider
\(D_i = 1\) treatment group
\(D_i = 0\) control group
\(T= 1\) After the treatment
\(T =0\) Before the treatment
After (T = 1) | Before (T = 0) | |
---|---|---|
Treated \(D_i =1\) | \(E[Y_{1i}(1)|D_i = 1]\) | \(E[Y_{0i}(0)|D)i=1]\) |
Control \(D_i = 0\) | \(E[Y_{0i}(1) |D_i =0]\) | \(E[Y_{0i}(0)|D_i=0]\) |
missing \(E[Y_{0i}(1)|D=1]\)
The Average Treatment Effect on Treated (ATT)
\[ \begin{aligned} E[Y_1(1) - Y_0(1)|D=1] &= \{E[Y(1)|D=1] - E[Y(1)|D=0] \} \\ &- \{E[Y(0)|D=1] - E[Y(0)|D=0] \} \end{aligned} \]
More elaboration:
- For the treatment group, we isolate the difference between being treated and not being treated. If the untreated group would have been affected in a different way, the DiD design and estimate would tell us nothing.
- Alternatively, because we can’t observe treatment variation in the control group, we can’t say anything about the treatment effect on this group.
Extension
- More than 2 groups (multiple treatments and multiple controls), and more than 2 period (pre and post)
\[ Y_{igt} = \alpha_g + \gamma_t + \beta I_{gt} + \delta X_{igt} + \epsilon_{igt} \]
where
\(\alpha_g\) is the group-specific fixed effect
\(\gamma_t\) = time specific fixed effect
\(\beta\) = dif-in-dif effect
\(I_{gt}\) = interaction terms (n treatment indicators x n post-treatment dummies) (capture effect heterogeneity over time)
This specification is the “two-way fixed effects DiD” - TWFE (i.e., 2 sets of fixed effects: group + time).
- However, if you have Staggered Dif-n-dif (i.e., treatment is applied at different times to different groups). TWFE is really bad.
- Long-term Effects
To examine the dynamic treatment effects (that are not under rollout/staggered design), we can create a centered time variable,
Centered Time Variable | Period |
---|---|
… | |
\(t = -1\) | 2 periods before treatment period |
\(t = 0\) | Last period right before treatment period Remember to use this period as reference group |
\(t = 1\) | Treatment period |
… |
By interacting this factor variable, we can examine the dynamic effect of treatment (i.e., whether it’s fading or intensifying)
\[ \begin{aligned} Y &= \alpha_0 + \alpha_1 Group + \alpha_2 Time \\ &+ \beta_{-T_1} Treatment+ \beta_{-(T_1 -1)} Treatment + \dots + \beta_{-1} Treatment \\ &+ \beta_1 + \dots + \beta_{T_2} Treatment \end{aligned} \]
where
\(\beta_0\) is used as the reference group (i.e., drop from the model)
\(T_1\) is the pre-treatment period
\(T_2\) is the post-treatment period
With more variables (i.e., interaction terms), coefficients estimates can be less precise (i.e., higher SE).
- DiD on the relationship, not levels. Technically, we can apply DiD research design not only on variables, but also on coefficients estimates of some other regression models with before and after a policy is implemented.
Goal:
- Pre-treatment coefficients should be non-significant \(\beta_{-T_1}, \dots, \beta_{-1} = 0\) (similar to the Placebo Test)
- Post-treatment coefficients are expected to be significant \(\beta_1, \dots, \beta_{T_2} \neq0\)
- You can now examine the trend in post-treatment coefficients (i.e., increasing or decreasing)
library(tidyverse)
library(fixest)
od <- causaldata::organ_donations %>%
# Treatment variable
dplyr::mutate(California = State == 'California') %>%
# centered time variable
dplyr::mutate(center_time = as.factor(Quarter_Num - 3))
# where 3 is the reference period precedes the treatment period
class(od$California)
#> [1] "logical"
class(od$State)
#> [1] "character"
cali <- feols(Rate ~ i(center_time, California, ref = 0) |
State + center_time,
data = od)
etable(cali)
#> cali
#> Dependent Var.: Rate
#>
#> California x center_time = -2 -0.0029 (0.0051)
#> California x center_time = -1 0.0063** (0.0023)
#> California x center_time = 1 -0.0216*** (0.0050)
#> California x center_time = 2 -0.0203*** (0.0045)
#> California x center_time = 3 -0.0222* (0.0100)
#> Fixed-Effects: -------------------
#> State Yes
#> center_time Yes
#> _____________________________ ___________________
#> S.E.: Clustered by: State
#> Observations 162
#> R2 0.97934
#> Within R2 0.00979
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
iplot(cali, pt.join = T)