26.8 Multiple periods and variation in treatment timing
This is an extension of the DiD framework to settings where you have
more than 2 time periods
different treatment timing
When treatment effects are heterogeneous across time or units, the standard Two-way Fixed-effects is inappropriate.
Notation is consistent with did
package (Callaway and Sant’Anna 2021)
\(Y_{it}(0)\) is the potential outcome for unit \(i\)
\(Y_{it}(g)\) is the potential outcome for unit \(i\) in time period \(t\) if it’s treated in period \(g\)
\(Y_{it}\) is the observed outcome for unit \(i\) in time period \(t\)
\[ Y_{it} = \begin{cases} Y_{it} = Y_{it}(0) & \forall i \in \text{never-treated group} \\ Y_{it} = 1\{G_i > t\} Y_{it}(0) + 1\{G_i \le t \}Y_{it}(G_i) & \forall i \in \text{other groups} \end{cases} \]
\(G_i\) is the time period when \(i\) is treated
\(C_i\) is a dummy when \(i\) belongs to the never-treated group
\(D_{it}\) is a dummy for whether \(i\) is treated in period \(t\)
Assumptions:
Staggered treatment adoption: once treated, a unit cannot be untreated (revert)
Parallel trends assumptions (conditional on covariates):
Based on never-treated units: \(E[Y_t(0)- Y_{t-1}(0)|G= g] = E[Y_t(0) - Y_{t-1}(0)|C=1]\)
- Without treatment, the average potential outcomes for group \(g\) equals the average potential outcomes for the never-treated group (i.e., control group), which means that we have (1) enough data on the never-treated group (2) the control group is similar to the eventually treated group.
Based on not-yet treated units: \(E[Y_t(0) - Y_{t-1}(0)|G = g] = E[Y_t(0) - Y_{t-1}(0)|D_s = 0, G \neq g]\)
Not-yet treated units by time \(s\) ( \(s \ge t\)) can be used as comparison groups to calculate the average treatment effects for the group first treated in time \(g\)
Additional assumption: pre-treatment trends across groups (Marcus and Sant’Anna 2021)
Random sampling
Irreversibility of treatment (once treated, cannot be untreated)
Overlap (the treatment propensity \(e \in [0,1]\))
Group-Time ATE
- This is the equivalent of the average treatment effect in the standard case (2 groups, 2 periods) under multiple time periods.
\[ ATT(g,t) = E[Y_t(g) - Y_t(0) |G = g] \]
which is the average treatment effect for group \(g\) in period \(t\)
Identification: When the parallel trends assumption based on
Never-treated units: \(ATT(g,t) = E[Y_t - Y_{g-1} |G = g] - E[Y_t - Y_{g-1}|C=1] \forall t \ge g\)
Not-yet-treated units: \(ATT(g,t) = E[Y_t - Y_{g-1}|G= g] - E[Y_t - Y_{g-1}|D_t = 0, G \neq g] \forall t \ge g\)
Identification: when the parallel trends assumption only holds conditional on covariates and based on
Never-treated units: \(ATT(g,t) = E[Y_t - Y_{g-1} |X, G = g] - E[Y_t - Y_{g-1}|X, C=1] \forall t \ge g\)
Not-yet-treated units: \(ATT(g,t) = E[Y_t - Y_{g-1}|X, G= g] - E[Y_t - Y_{g-1}|X, D_t = 0, G \neq g] \forall t \ge g\)
This is plausible when you have suspected selection bias that can be corrected by using covariates (i.e., very much similar to matching methods to have plausible parallel trends).
Possible parameters of interest are:
- Average treatment effect per group
\[ \theta_S(g) = \frac{1}{\tau - g + 1} \sum_{t = 2}^\tau \mathbb{1} \{ \le t \} ATT(g,t) \]
- Average treatment effect across groups (that were treated) (similar to average treatment effect on the treated in the canonical case)
\[ \theta_S^O := \sum_{g=2}^\tau \theta_S(g) P(G=g) \]
- Average treatment effect dynamics (i.e., average treatment effect for groups that have been exposed to the treatment for \(e\) time periods):
\[ \theta_D(e) := \sum_{g=2}^\tau \mathbb{1} \{g + e \le \tau \}ATT(g,g + e) P(G = g|G + e \le \tau) \]
- Average treatment effect in period \(t\) for all groups that have treated by period \(t\))
\[ \theta_C(t) = \sum_{g=2}^\tau \mathbb{1}\{g \le t\} ATT(g,t) P(G = g|g \le t) \]
- Average treatment effect by calendar time
\[ \theta_C = \frac{1}{\tau-1}\sum_{t=2}^\tau \theta_C(t) \]