26.7 Two-way Fixed-effects

A generalization of the dif-n-dif model is the two-way fixed-effects models where you have multiple groups and time effects. But this is not a designed-based, non-parametric causal estimator (Imai and Kim 2021)

When applying TWFE to multiple groups and multiple periods, the supposedly causal coefficient is the weighted average of all two-group/two-period DiD estimators in the data where some of the weights can be negative. More specifically, the weights are proportional to group sizes and treatment indicator’s variation in each pair, where units in the middle of the panel have the highest weight.

The canonical/standard TWFE only works when

  • Effects are homogeneous across units and across time periods (i.e., no dynamic changes in the effects of treatment). See (Goodman-Bacon 2021; Clément De Chaisemartin and d’Haultfoeuille 2020; L. Sun and Abraham 2021; Borusyak, Jaravel, and Spiess 2021) for details. Similarly, it relies on the assumption of linear additive effects (Imai and Kim 2021)

    • Have to argue why treatment heterogeneity is not a problem (e.g., plot treatment timing and decompose treatment coefficient using Goodman-Bacon Decomposition) know the percentage of observation are never treated (because as the never-treated group increases, the bias of TWFE decreases, with 80% sample to be never-treated, bias is negligible). The problem is worsen when you have long-run effects.

    • Need to manually drop two relative time periods if everyone is eventually treated (to avoid multicollinearity). Programs might do this randomly and if it chooses to drop a post-treatment period, it will create biases. The choice usually -1, and -2 periods.

    • Treatment heterogeneity can come in because (1) it might take some time for a treatment to have measurable changes in outcomes or (2) for each period after treatment, the effect can be different (phase in or increasing effects).

  • 2 time periods.

Within this setting, TWFE works because, using the baseline (e.g., control units where their treatment status is unchanged across time periods), the comparison can be

  • Good for

    • Newly treated units vs. control

    • Newly treated units vs not-yet treated

  • Bad for

    • Newly treated vs. already treated (because already treated cannot serve as the potential outcome for the newly treated).
    • Strict exogeneity (i.e., time-varying confounders, feedback from past outcome to treatment) (Imai and Kim 2019)
    • Specific functional forms (i.e., treatment effect homogeneity and no carryover effects or anticipation effects) (Imai and Kim 2019)

Note: Notation for this section is consistent with (2020)

\[ Y_{it} = \alpha_i + \lambda_t + \tau W_{it} + \beta X_{it} + \epsilon_{it} \]


  • \(Y_{it}\) is the outcome

  • \(\alpha_i\) is the unit FE

  • \(\lambda_t\) is the time FE

  • \(\tau\) is the causal effect of treatment

  • \(W_{it}\) is the treatment indicator

  • \(X_{it}\) are covariates

When \(T = 2\), the TWFE is the traditional DiD model

Under the following assumption, \(\hat{\tau}_{OLS}\) is unbiased:

  1. homogeneous treatment effect
  2. parallel trends assumptions
  3. linear additive effects (Imai and Kim 2021)

Remedies for TWFE’s shortcomings

To be robust against

  1. time- and unit-varying effects

We can use the reshaped inverse probability weighting (RIPW)- TWFE estimator

With the following assumptions:


  • Binary treatment: \(\mathbf{W}_i = (W_{i1}, \dots, W_{it})\) where \(\mathbf{W}_i \sim \mathbf{\pi}_i\) generalized propensity score (i.e., each person treatment likelihood follow \(\pi\) regardless of the period)

Then, the unit-time specific effect is \(\tau_{it} = Y_{it}(1) - Y_{it}(0)\)

Then the Doubly Average Treatment Effect (DATE) is

\[ \tau(\xi) = \sum_{T=1}^T \xi_t \left(\frac{1}{n} \sum_{i = 1}^n \tau_{it} \right) \]


  • \(\frac{1}{n} \sum_{i = 1}^n \tau_{it}\) is the unweighted effect of treatment across units (i.e., time-specific ATE).

  • \(\xi = (\xi_1, \dots, \xi_t)\) are user-specific weights for each time period.

  • This estimand is called DATE because it’s weighted (averaged) across both time and units.

A special case of DATE is when both time and unit-weights are equal

\[ \tau_{eq} = \frac{1}{nT} \sum_{t=1}^T \sum_{i = 1}^n \tau_{it} \]

Borrowing the idea of inverse propensity-weighted least squares estimator in the cross-sectional case that we reweight the objective function via the treatment assignment mechanism:

\[ \hat{\tau} \triangleq \arg \min_{\tau} \sum_{i = 1}^n (Y_i -\mu - W_i \tau)^2 \frac{1}{\pi_i (W_i)} \]


  • the first term is the least squares objective

  • the second term is the propensity score

In the panel data case, the IPW estimator will be

\[ \hat{\tau}_{IPW} \triangleq \arg \min_{\tau} \sum_{i = 1}^n \sum_{t =1}^T (Y_{i t}-\alpha_i - \lambda_t - W_{it} \tau)^2 \frac{1}{\pi_i (W_i)} \]

Then, to have DATE that users can specify the structure of time weight, we use reshaped IPW estimator (2020)

\[ \hat{\tau}_{RIPW} (\Pi) \triangleq \arg \min_{\tau} \sum_{i = 1}^n \sum_{t =1}^T (Y_{i t}-\alpha_i - \lambda_t - W_{it} \tau)^2 \frac{\Pi(W_i)}{\pi_i (W_i)} \]

where it’s a function of a data-independent distribution \(\Pi\) that depends on the support of the treatment path \(\mathbb{S} = \cup_i Supp(W_i)\)

This generalization can transform to

  • IPW-TWFE estimator when \(\Pi \sim Unif(\mathbb{S})\)

  • randomized experiment when \(\Pi = \pi_i\)

To choose \(\Pi\), we don’t need to data, we just need possible assignments in your setting.

  • For most practical problems (DiD, staggered, transient), we have closed form solutions

  • For generic solver, we can use nonlinear programming (e..g, BFGS algorithm)

As argued in (Imai and Kim 2021) that TWFE is not a non-parametric approach, it can be subjected to incorrect model assumption (i.e., model dependence).

  • Hence, they advocate for matching methods for time-series cross-sectional data (Imai and Kim 2021)

  • Use wfe and PanelMatch to apply their paper.

This package is based on (Somaini and Wolak 2016)

# dataset
df <- bacondecomp::castle
# devtools::install_github("paulosomaini/xtreg2way")

# output <- xtreg2way(y,
#                     data.frame(x1, x2),
#                     iid,
#                     tid,
#                     w,
#                     noise = "1",
#                     se = "1")

# equilvalently
output <- xtreg2way(l_homicide ~ post,
                    iid = df$state, # group id
                    tid = df$year, # time id
                    # w, # vector of weight
                    se = "1")
#>                  [,1]
#> l_homicide 0.08181162
#>             [,1]
#> [1,] 0.003396724

# to save time, you can use your structure in the 
# last output for a new set of variables
# output2 <- xtreg2way(y, x1, struc=output$struc)

Standard errors estimation options

Set Estimation
se = "0" Assume homoskedasticity and no within group correlation or serial correlation
se = "1" (default) robust to heteroskadasticity and serial correlation (Arellano 1987)
se = "2" robust to heteroskedasticity, but assumes no correlation within group or serial correlation
se = "11" Aerllano SE with df correction performed by Stata xtreg (Somaini and Wolak 2021)

Alternatively, you can also do it manually or with the plm package, but you have to be careful with how the SEs are estimated

library(multiwayvcov) # get vcov matrix 
library(lmtest) # robust SEs estimation

# manual
output3 <- lm(l_homicide ~ post + factor(state) + factor(year),
              data = df)

# get variance-covariance matrix
vcov_tw <- multiwayvcov::cluster.vcov(output3,
                        cbind(df$state, df$year),
                        use_white = F,
                        df_correction = F)

# get coefficients
coeftest(output3, vcov_tw)[2,] 
#>   Estimate Std. Error    t value   Pr(>|t|) 
#> 0.08181162 0.05671410 1.44252696 0.14979397
# using the plm package

output4 <- plm(l_homicide ~ post, 
               data = df, 
               index = c("state", "year"), 
               model = "within", 
               effect = "twoways")

# get coefficients
coeftest(output4, vcov = vcovHC, type = "HC1")
#> t test of coefficients:
#>      Estimate Std. Error t value Pr(>|t|)
#> post 0.081812   0.057748  1.4167   0.1572

As you can see, differences stem from SE estimation, not the coefficient estimate.


2020, July. http://dx.doi.org/10.1038/s41562-020-0912-z.
Arellano, Manuel. 1987. “Computing Robust Standard Errors for Within-Groups Estimators.” Oxford Bulletin of Economics and Statistics 49 (4): 431–34.
Borusyak, Kirill, Xavier Jaravel, and Jann Spiess. 2021. “Revisiting Event Study Designs: Robust and Efficient Estimation.” arXiv Preprint arXiv:2108.12419.
Callaway, Brantly, and Pedro HC Sant’Anna. 2021. “Difference-in-Differences with Multiple Time Periods.” Journal of Econometrics 225 (2): 200–230.
De Chaisemartin, Clément, and Xavier d’Haultfoeuille. 2020. “Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects.” American Economic Review 110 (9): 2964–96.
Gardner, John. 2022. “Two-Stage Differences in Differences.” arXiv Preprint arXiv:2207.05943.
Goodman-Bacon, Andrew. 2021. “Difference-in-Differences with Variation in Treatment Timing.” Journal of Econometrics 225 (2): 254–77.
Imai, Kosuke, and In Song Kim. 2019. “When Should We Use Unit Fixed Effects Regression Models for Causal Inference with Longitudinal Data?” American Journal of Political Science 63 (2): 467–90.
———. 2021. “On the Use of Two-Way Fixed Effects Regression Models for Causal Inference with Panel Data.” Political Analysis 29 (3): 405–15.
Somaini, Paulo, and Frank A Wolak. 2016. “An Algorithm to Estimate the Two-Way Fixed Effects Model.” Journal of Econometric Methods 5 (1): 143–52.
———. 2021. “TWFEM: Stata Module to Efficiently Estimate a Two-Way Fixed Effects Model Based on Somaini and Wolak (2015).”
Sun, Liyang, and Sarah Abraham. 2021. “Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects.” Journal of Econometrics 225 (2): 175–99.
Sun, Liyang, and Jesse M Shapiro. 2022. “A Linear Panel Model with Heterogeneous Coefficients and Variation in Exposure.” Journal of Economic Perspectives 36 (4): 193–204.