26.7 Two-way Fixed-effects
A generalization of the dif-n-dif model is the two-way fixed-effects models where you have multiple groups and time effects. But this is not a designed-based, non-parametric causal estimator (Imai and Kim 2021)
When applying TWFE to multiple groups and multiple periods, the supposedly causal coefficient is the weighted average of all two-group/two-period DiD estimators in the data where some of the weights can be negative. More specifically, the weights are proportional to group sizes and treatment indicator’s variation in each pair, where units in the middle of the panel have the highest weight.
The canonical/standard TWFE only works when
Effects are homogeneous across units and across time periods (i.e., no dynamic changes in the effects of treatment). See (Goodman-Bacon 2021; Clément De Chaisemartin and d’Haultfoeuille 2020; L. Sun and Abraham 2021; Borusyak, Jaravel, and Spiess 2021) for details. Similarly, it relies on the assumption of linear additive effects (Imai and Kim 2021)
Have to argue why treatment heterogeneity is not a problem (e.g., plot treatment timing and decompose treatment coefficient using Goodman-Bacon Decomposition) know the percentage of observation are never treated (because as the never-treated group increases, the bias of TWFE decreases, with 80% sample to be never-treated, bias is negligible). The problem is worsen when you have long-run effects.
Need to manually drop two relative time periods if everyone is eventually treated (to avoid multicollinearity). Programs might do this randomly and if it chooses to drop a post-treatment period, it will create biases. The choice usually -1, and -2 periods.
Treatment heterogeneity can come in because (1) it might take some time for a treatment to have measurable changes in outcomes or (2) for each period after treatment, the effect can be different (phase in or increasing effects).
2 time periods.
Within this setting, TWFE works because, using the baseline (e.g., control units where their treatment status is unchanged across time periods), the comparison can be
Good for
Newly treated units vs. control
Newly treated units vs not-yet treated
Bad for
- Newly treated vs. already treated (because already treated cannot serve as the potential outcome for the newly treated).
- Strict exogeneity (i.e., time-varying confounders, feedback from past outcome to treatment) (Imai and Kim 2019)
- Specific functional forms (i.e., treatment effect homogeneity and no carryover effects or anticipation effects) (Imai and Kim 2019)
Note: Notation for this section is consistent with (2020)
\[ Y_{it} = \alpha_i + \lambda_t + \tau W_{it} + \beta X_{it} + \epsilon_{it} \]
where
\(Y_{it}\) is the outcome
\(\alpha_i\) is the unit FE
\(\lambda_t\) is the time FE
\(\tau\) is the causal effect of treatment
\(W_{it}\) is the treatment indicator
\(X_{it}\) are covariates
When \(T = 2\), the TWFE is the traditional DiD model
Under the following assumption, \(\hat{\tau}_{OLS}\) is unbiased:
- homogeneous treatment effect
- parallel trends assumptions
- linear additive effects (Imai and Kim 2021)
Remedies for TWFE’s shortcomings
(Goodman-Bacon 2021): diagnostic robustness tests of the TWFE DiD and identify influential observations to the DiD estimate (Goodman-Bacon Decomposition)
(Callaway and Sant’Anna 2021): 2-step estimation with a bootstrap procedure that can account for autocorrelation and clustering,
the parameters of interest are the group-time average treatment effects, where each group is defined by when it was first treated (Multiple periods and variation in treatment timing)
Comparing post-treatment outcomes fo groups treated in a period against a similar group that is never treated (using matching).
Treatment status cannot switch (once treated, stay treated for the rest of the panel)
Package:
did
(L. Sun and Abraham 2021): a specialization of (Callaway and Sant’Anna 2021) in the event-study context.
They include lags and leads in their design
have cohort-specific estimates (similar to group-time estimates in (Callaway and Sant’Anna 2021)
They propose the “interaction-weighted” estimator.
Package:
fixest
-
Different from (Callaway and Sant’Anna 2021) because they allow units to switch in and out of treatment.
Based on matching methods, to have weighted TWFE
Package:
wfe
andPanelMatch
(Gardner 2022): two-stage DiD
did2s
In cases with an unaffected unit (i.e., never-treated), using the exposure-adjusted difference-in-differences estimators can recover the average treatment effect (Clément De Chaisemartin and d’Haultfoeuille 2020). However, if you want to see the treatment effect heterogeneity (in cases where the true heterogeneous treatment effects vary by the exposure rate), exposure-adjusted did still fails (L. Sun and Shapiro 2022).
(2020): see below
To be robust against
- time- and unit-varying effects
We can use the reshaped inverse probability weighting (RIPW)- TWFE estimator
With the following assumptions:
SUTVA
Binary treatment: \(\mathbf{W}_i = (W_{i1}, \dots, W_{it})\) where \(\mathbf{W}_i \sim \mathbf{\pi}_i\) generalized propensity score (i.e., each person treatment likelihood follow \(\pi\) regardless of the period)
Then, the unit-time specific effect is \(\tau_{it} = Y_{it}(1) - Y_{it}(0)\)
Then the Doubly Average Treatment Effect (DATE) is
\[ \tau(\xi) = \sum_{T=1}^T \xi_t \left(\frac{1}{n} \sum_{i = 1}^n \tau_{it} \right) \]
where
\(\frac{1}{n} \sum_{i = 1}^n \tau_{it}\) is the unweighted effect of treatment across units (i.e., time-specific ATE).
\(\xi = (\xi_1, \dots, \xi_t)\) are user-specific weights for each time period.
This estimand is called DATE because it’s weighted (averaged) across both time and units.
A special case of DATE is when both time and unit-weights are equal
\[ \tau_{eq} = \frac{1}{nT} \sum_{t=1}^T \sum_{i = 1}^n \tau_{it} \]
Borrowing the idea of inverse propensity-weighted least squares estimator in the cross-sectional case that we reweight the objective function via the treatment assignment mechanism:
\[ \hat{\tau} \triangleq \arg \min_{\tau} \sum_{i = 1}^n (Y_i -\mu - W_i \tau)^2 \frac{1}{\pi_i (W_i)} \]
where
the first term is the least squares objective
the second term is the propensity score
In the panel data case, the IPW estimator will be
\[ \hat{\tau}_{IPW} \triangleq \arg \min_{\tau} \sum_{i = 1}^n \sum_{t =1}^T (Y_{i t}-\alpha_i - \lambda_t - W_{it} \tau)^2 \frac{1}{\pi_i (W_i)} \]
Then, to have DATE that users can specify the structure of time weight, we use reshaped IPW estimator (2020)
\[ \hat{\tau}_{RIPW} (\Pi) \triangleq \arg \min_{\tau} \sum_{i = 1}^n \sum_{t =1}^T (Y_{i t}-\alpha_i - \lambda_t - W_{it} \tau)^2 \frac{\Pi(W_i)}{\pi_i (W_i)} \]
where it’s a function of a data-independent distribution \(\Pi\) that depends on the support of the treatment path \(\mathbb{S} = \cup_i Supp(W_i)\)
This generalization can transform to
IPW-TWFE estimator when \(\Pi \sim Unif(\mathbb{S})\)
randomized experiment when \(\Pi = \pi_i\)
To choose \(\Pi\), we don’t need to data, we just need possible assignments in your setting.
For most practical problems (DiD, staggered, transient), we have closed form solutions
For generic solver, we can use nonlinear programming (e..g, BFGS algorithm)
As argued in (Imai and Kim 2021) that TWFE is not a non-parametric approach, it can be subjected to incorrect model assumption (i.e., model dependence).
Hence, they advocate for matching methods for time-series cross-sectional data (Imai and Kim 2021)
Use
wfe
andPanelMatch
to apply their paper.
This package is based on (Somaini and Wolak 2016)
# devtools::install_github("paulosomaini/xtreg2way")
library(xtreg2way)
# output <- xtreg2way(y,
# data.frame(x1, x2),
# iid,
# tid,
# w,
# noise = "1",
# se = "1")
# equilvalently
output <- xtreg2way(l_homicide ~ post,
df,
iid = df$state, # group id
tid = df$year, # time id
# w, # vector of weight
se = "1")
output$betaHat
#> [,1]
#> l_homicide 0.08181162
output$aVarHat
#> [,1]
#> [1,] 0.003396724
# to save time, you can use your structure in the
# last output for a new set of variables
# output2 <- xtreg2way(y, x1, struc=output$struc)
Standard errors estimation options
Set | Estimation |
---|---|
se = "0" |
Assume homoskedasticity and no within group correlation or serial correlation |
se = "1" (default) |
robust to heteroskadasticity and serial correlation (Arellano 1987) |
se = "2" |
robust to heteroskedasticity, but assumes no correlation within group or serial correlation |
se = "11" |
Aerllano SE with df correction performed by Stata xtreg (Somaini and Wolak 2021) |
Alternatively, you can also do it manually or with the plm
package, but you have to be careful with how the SEs are estimated
library(multiwayvcov) # get vcov matrix
library(lmtest) # robust SEs estimation
# manual
output3 <- lm(l_homicide ~ post + factor(state) + factor(year),
data = df)
# get variance-covariance matrix
vcov_tw <- multiwayvcov::cluster.vcov(output3,
cbind(df$state, df$year),
use_white = F,
df_correction = F)
# get coefficients
coeftest(output3, vcov_tw)[2,]
#> Estimate Std. Error t value Pr(>|t|)
#> 0.08181162 0.05671410 1.44252696 0.14979397
# using the plm package
library(plm)
output4 <- plm(l_homicide ~ post,
data = df,
index = c("state", "year"),
model = "within",
effect = "twoways")
# get coefficients
coeftest(output4, vcov = vcovHC, type = "HC1")
#>
#> t test of coefficients:
#>
#> Estimate Std. Error t value Pr(>|t|)
#> post 0.081812 0.057748 1.4167 0.1572
As you can see, differences stem from SE estimation, not the coefficient estimate.