29.7 Two-way Fixed-effects

A generalization of the dif-n-dif model is the two-way fixed-effects models where you have multiple groups and time effects. But this is not a designed-based, non-parametric causal estimator (Imai and Kim 2021)

When applying TWFE to multiple groups and multiple periods, the supposedly causal coefficient is the weighted average of all two-group/two-period DiD estimators in the data where some of the weights can be negative. More specifically, the weights are proportional to group sizes and treatment indicator’s variation in each pair, where units in the middle of the panel have the highest weight.

The canonical/standard TWFE only works when

Effects are homogeneous across units and across time periods (i.e., no dynamic changes in the effects of treatment). See (Goodman-Bacon 2021; Clément De Chaisemartin and d’Haultfoeuille 2020; L. Sun and Abraham 2021; Borusyak, Jaravel, and Spiess 2021) for details. Similarly, it relies on the assumption of linear additive effects (Imai and Kim 2021)
- Have to argue why treatment heterogeneity is not a problem (e.g., plot treatment timing and decompose treatment coefficient using Goodman-Bacon Decomposition) know the percentage of observation are never treated (because as the never-treated group increases, the bias of TWFE decreases, with 80% sample to be never-treated, bias is negligible). The problem is worsen when you have long-run effects.
- Need to manually drop two relative time periods if everyone is eventually treated (to avoid multicollinearity). Programs might do this randomly and if it chooses to drop a post-treatment period, it will create biases. The choice usually -1, and -2 periods.
- Treatment heterogeneity can come in because (1) it might take some time for a treatment to have measurable changes in outcomes or (2) for each period after treatment, the effect can be different (phase in or increasing effects).
2 time periods.

Within this setting, TWFE works because, using the baseline (e.g., control units where their treatment status is unchanged across time periods), the comparison can be

Good for
- Newly treated units vs. control
- Newly treated units vs not-yet treated
Bad for
- Newly treated vs. already treated (because already treated cannot serve as the potential outcome for the newly treated).
- Strict exogeneity (i.e., time-varying confounders, feedback from past outcome to treatment) (Imai and Kim 2019)
- Specific functional forms (i.e., treatment effect homogeneity and no carryover effects or anticipation effects) (Imai and Kim 2019)

Note: Notation for this section is consistent with (2020)

$Y_{it} = \alpha_i + \lambda_t + \tau W_{it} + \beta X_{it} + \epsilon_{it}$

where

$Y_{it}$ is the outcome
$\alpha_i$ is the unit FE
$\lambda_t$ is the time FE
$\tau$ is the causal effect of treatment
$W_{it}$ is the treatment indicator
$X_{it}$ are covariates

When $T = 2$ , the TWFE is the traditional DiD model

Under the following assumption, $\hat{\tau}_{OLS}$ is unbiased:

homogeneous treatment effect
parallel trends assumptions
linear additive effects (Imai and Kim 2021)

Remedies for TWFE’s shortcomings

(Goodman-Bacon 2021): diagnostic robustness tests of the TWFE DiD and identify influential observations to the DiD estimate (Goodman-Bacon Decomposition)
(Callaway and Sant’Anna 2021): 2-step estimation with a bootstrap procedure that can account for autocorrelation and clustering,
- the parameters of interest are the group-time average treatment effects, where each group is defined by when it was first treated (Multiple periods and variation in treatment timing)
- Comparing post-treatment outcomes fo groups treated in a period against a similar group that is never treated (using matching).
- Treatment status cannot switch (once treated, stay treated for the rest of the panel)
- Package: did
(L. Sun and Abraham 2021): a specialization of (Callaway and Sant’Anna 2021) in the event-study context.
- They include lags and leads in their design
- have cohort-specific estimates (similar to group-time estimates in (Callaway and Sant’Anna 2021)
- They propose the “interaction-weighted” estimator.
- Package: fixest
(Imai and Kim 2021)
- Different from (Callaway and Sant’Anna 2021) because they allow units to switch in and out of treatment.
- Based on matching methods, to have weighted TWFE
- Package: wfe and PanelMatch
(Gardner 2022): two-stage DiD
- did2s
In cases with an unaffected unit (i.e., never-treated), using the exposure-adjusted difference-in-differences estimators can recover the average treatment effect (Clément De Chaisemartin and d’Haultfoeuille 2020). However, if you want to see the treatment effect heterogeneity (in cases where the true heterogeneous treatment effects vary by the exposure rate), exposure-adjusted did still fails (L. Sun and Shapiro 2022).
(2020): see below

To be robust against

time- and unit-varying effects

We can use the reshaped inverse probability weighting (RIPW)- TWFE estimator

With the following assumptions:

SUTVA
Binary treatment: $\mathbf{W}_i = (W_{i1}, \dots, W_{it})$ where $\mathbf{W}_i \sim \mathbf{\pi}_i$ generalized propensity score (i.e., each person treatment likelihood follow $\pi$ regardless of the period)

Then, the unit-time specific effect is $\tau_{it} = Y_{it}(1) - Y_{it}(0)$

Then the Doubly Average Treatment Effect (DATE) is

$\tau(\xi) = \sum_{T=1}^T \xi_t \left(\frac{1}{n} \sum_{i = 1}^n \tau_{it} \right)$

where

$\frac{1}{n} \sum_{i = 1}^n \tau_{it}$ is the unweighted effect of treatment across units (i.e., time-specific ATE).
$\xi = (\xi_1, \dots, \xi_t)$ are user-specific weights for each time period.
This estimand is called DATE because it’s weighted (averaged) across both time and units.

A special case of DATE is when both time and unit-weights are equal

$\tau_{eq} = \frac{1}{nT} \sum_{t=1}^T \sum_{i = 1}^n \tau_{it}$

Borrowing the idea of inverse propensity-weighted least squares estimator in the cross-sectional case that we reweight the objective function via the treatment assignment mechanism:

$\hat{\tau} \triangleq \arg \min_{\tau} \sum_{i = 1}^n (Y_i -\mu - W_i \tau)^2 \frac{1}{\pi_i (W_i)}$

where

the first term is the least squares objective
the second term is the propensity score

In the panel data case, the IPW estimator will be

$\hat{\tau}_{IPW} \triangleq \arg \min_{\tau} \sum_{i = 1}^n \sum_{t =1}^T (Y_{i t}-\alpha_i - \lambda_t - W_{it} \tau)^2 \frac{1}{\pi_i (W_i)}$

Then, to have DATE that users can specify the structure of time weight, we use reshaped IPW estimator (2020)

$\hat{\tau}_{RIPW} (\Pi) \triangleq \arg \min_{\tau} \sum_{i = 1}^n \sum_{t =1}^T (Y_{i t}-\alpha_i - \lambda_t - W_{it} \tau)^2 \frac{\Pi(W_i)}{\pi_i (W_i)}$

where it’s a function of a data-independent distribution $\Pi$ that depends on the support of the treatment path $\mathbb{S} = \cup_i Supp(W_i)$

This generalization can transform to

IPW-TWFE estimator when $\Pi \sim Unif(\mathbb{S})$
randomized experiment when $\Pi = \pi_i$

To choose $\Pi$ , we don’t need to data, we just need possible assignments in your setting.

For most practical problems (DiD, staggered, transient), we have closed form solutions
For generic solver, we can use nonlinear programming (e..g, BFGS algorithm)

As argued in (Imai and Kim 2021) that TWFE is not a non-parametric approach, it can be subjected to incorrect model assumption (i.e., model dependence).

Hence, they advocate for matching methods for time-series cross-sectional data (Imai and Kim 2021)
Use wfe and PanelMatch to apply their paper.

This package is based on (Somaini and Wolak 2016)

# dataset
library(bacondecomp)
df <- bacondecomp::castle

# devtools::install_github("paulosomaini/xtreg2way")

library(xtreg2way)
# output <- xtreg2way(y,
#                     data.frame(x1, x2),
#                     iid,
#                     tid,
#                     w,
#                     noise = "1",
#                     se = "1")

# equilvalently
output <- xtreg2way(l_homicide ~ post,
                    df,
                    iid = df$state, # group id
                    tid = df$year, # time id
                    # w, # vector of weight
                    se = "1")
output$betaHat
#>                  [,1]
#> l_homicide 0.08181162
output$aVarHat
#>             [,1]
#> [1,] 0.003396724

# to save time, you can use your structure in the 
# last output for a new set of variables
# output2 <- xtreg2way(y, x1, struc=output$struc)

Standard errors estimation options

Set	Estimation
`se = "0"`	Assume homoskedasticity and no within group correlation or serial correlation
`se = "1"` (default)	robust to heteroskadasticity and serial correlation (Arellano 1987)
`se = "2"`	robust to heteroskedasticity, but assumes no correlation within group or serial correlation
`se = "11"`	Aerllano SE with df correction performed by Stata xtreg (Somaini and Wolak 2021)

Alternatively, you can also do it manually or with the plm package, but you have to be careful with how the SEs are estimated

library(multiwayvcov) # get vcov matrix 
library(lmtest) # robust SEs estimation

# manual
output3 <- lm(l_homicide ~ post + factor(state) + factor(year),
              data = df)

# get variance-covariance matrix
vcov_tw <- multiwayvcov::cluster.vcov(output3,
                        cbind(df$state, df$year),
                        use_white = F,
                        df_correction = F)

# get coefficients
coeftest(output3, vcov_tw)[2,] 
#>   Estimate Std. Error    t value   Pr(>|t|) 
#> 0.08181162 0.05671410 1.44252696 0.14979397

# using the plm package
library(plm)

output4 <- plm(l_homicide ~ post, 
               data = df, 
               index = c("state", "year"), 
               model = "within", 
               effect = "twoways")

# get coefficients
coeftest(output4, vcov = vcovHC, type = "HC1")
#> 
#> t test of coefficients:
#> 
#>      Estimate Std. Error t value Pr(>|t|)
#> post 0.081812   0.057748  1.4167   0.1572

As you can see, differences stem from SE estimation, not the coefficient estimate.

References

2020, July. http://dx.doi.org/10.1038/s41562-020-0912-z.

Arellano, Manuel. 1987. “Computing Robust Standard Errors for Within-Groups Estimators.” Oxford Bulletin of Economics and Statistics 49 (4): 431–34.

Borusyak, Kirill, Xavier Jaravel, and Jann Spiess. 2021. “Revisiting Event Study Designs: Robust and Efficient Estimation.” arXiv Preprint arXiv:2108.12419.

Callaway, Brantly, and Pedro HC Sant’Anna. 2021. “Difference-in-Differences with Multiple Time Periods.” Journal of Econometrics 225 (2): 200–230.

De Chaisemartin, Clément, and Xavier d’Haultfoeuille. 2020. “Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects.” American Economic Review 110 (9): 2964–96.

Gardner, John. 2022. “Two-Stage Differences in Differences.” arXiv Preprint arXiv:2207.05943.

Goodman-Bacon, Andrew. 2021. “Difference-in-Differences with Variation in Treatment Timing.” Journal of Econometrics 225 (2): 254–77.

Imai, Kosuke, and In Song Kim. 2019. “When Should We Use Unit Fixed Effects Regression Models for Causal Inference with Longitudinal Data?” American Journal of Political Science 63 (2): 467–90.

———. 2021. “On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data.” Political Analysis 29 (3): 405–15.

Somaini, Paulo, and Frank A Wolak. 2016. “An Algorithm to Estimate the Two-Way Fixed Effects Model.” Journal of Econometric Methods 5 (1): 143–52.

———. 2021. “TWFEM: Stata Module to Efficiently Estimate a Two-Way Fixed Effects Model Based on Somaini and Wolak (2015).”

Sun, Liyang, and Sarah Abraham. 2021. “Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects.” Journal of Econometrics 225 (2): 175–99.

Sun, Liyang, and Jesse M Shapiro. 2022. “A Linear Panel Model with Heterogeneous Coefficients and Variation in Exposure.” Journal of Economic Perspectives 36 (4): 193–204.