25.2 Application

SDID Algorithm

  1. Compute regularization parameter \(\zeta\)

\[ \zeta = (N_{t}T_{post})^{1/4} \hat{\sigma} \]

where

\[ \hat{\sigma}^2 = \frac{1}{N_c(T_{pre}- 1)} \sum_{i = 1}^{N_c} \sum_{t = 1}^{T_{re}-1}(\Delta_{it} - \hat{\Delta})^2 \]

  • \(\Delta_{it} = Y_{i(t + 1)} - Y_{it}\)

  • \(\hat{\Delta} = \frac{1}{N_c(T_{pre} - 1)}\sum_{i = 1}^{N_c}\sum_{t = 1}^{T_{pre}-1} \Delta_{it}\)

  1. Compute unit weights \(\hat{w}^{sdid}\)

\[ (\hat{w}_0, \hat{w}^{sidid}) = \arg \min_{w_0 \in R, w \in \Omega}l_{unit}(w_0, w) \]

where

  • \(l_{unit} (w_0, w) = \sum_{t = 1}^{T_{pre}}(w_0 + \sum_{i = 1}^{N_c}w_i Y_{it} - \frac{1}{N_t}\sum_{i = N_c + 1}^NY_{it})^2 + \zeta^2 T_{pre}||w||_2^2\)

  • \(\Omega = \{w \in R_+^N: \sum_{i = 1}^{N_c} w_i = 1, w_i = N_t^{-1} \forall i = N_c + 1, \dots, N \}\)

  1. Compute time weights \(\hat{\lambda}^{sdid}\)

\[ (\hat{\lambda}_0 , \hat{\lambda}^{sdid}) = \arg \min_{\lambda_0 \in R, \lambda \in \Lambda} l_{time}(\lambda_0, \lambda) \]

where

  • \(l_{time} (\lambda_0, \lambda) = \sum_{i = 1}^{N_c}(\lambda_0 + \sum_{t = 1}^{T_{pre}} \lambda_t Y_{it} - \frac{1}{T_{post}} \sum_{t = T_{pre} + 1}^T Y_{it})^2\)

  • \(\Lambda = \{ \lambda \in R_+^T: \sum_{t = 1}^{T_{pre}} \lambda_t = 1, \lambda_t = T_{post}^{-1} \forall t = T_{pre} + 1, \dots, T\}\)

  1. Compute the SDID estimator

\[ (\hat{\tau}^{sdid}, \hat{\mu}, \hat{\alpha}, \hat{\beta}) = \arg \min_{\tau, \mu, \alpha, \beta}\{ \sum_{i = 1}^N \sum_{t = 1}^T (Y_{it} - \mu - \alpha_i - \beta_t - W_{it} \tau)^2 \hat{w}_i^{sdid}\hat{\lambda}_t^{sdid} \]


SE Estimation

  • Under certain assumptions (errors, samples, and interaction properties between time and unit fixed effects) detailed in (Arkhangelsky et al. 2019, 4107), SDID is asymptotically normal and zero-centered

  • Using its asymptotic variance, conventional confidence intervals can be applied to SDID.

\[ \tau \in \hat{\tau}^{sdid} \pm z_{\alpha/2}\sqrt{\hat{V}_\tau} \]

  • There are 3 approaches for variance estimation in confidence intervals:

    1. Clustered Bootstrap (Efron 1992):

      • Independently resample units.

      • Advantages: Simple to use; robust performance in large panels due to natural approach to inference with panel data where observations of the same unit might be correlated.

      • Disadvantage: Computationally expensive.

    2. Jackknife (Miller 1974):

      • Applied to weighted SDID regression with fixed weights.

      • Generally conservative and precise when treated and control units are sufficiently similar.

      • Not recommended for some methods, like the SC estimator, due to potential biases.

      • Appropriate for jackknifing DID without random weights.

    3. Placebo Variance Estimation:

      • Can used in cases with only one treated unit or large panels.

      • Placebo evaluations swap out the treated unit for untreated ones to estimate noise.

      • Relies on homoskedasticity across units.

      • Depends on homoskedasticity across units. It hinges on the empirical distribution of residuals from placebo estimators on control units.

      • The validity of the placebo method hinges on consistent noise distribution across units. One treated unit makes nonparametric variance estimation difficult, necessitating homoskedasticity for feasible inference. Detailed analysis available in Conley and Taber (2011).

All algorithms are from Arkhangelsky et al. (2021), p. 4109:

Bootstrap Variance Estimation

  1. For each \(b\) from \(1 \to B\):

    • Sample \(N\) rows from \((\mathbf{Y}, \mathbf{W})\) to get (\(\mathbf{Y}^{(b)}, \mathbf{W}^{(b)}\)) with replacement.

    • If the sample lacks treated or control units, resample.

    • Calculate \(\tau^{(b)}\) using (\(\mathbf{Y}^{(b)}, \mathbf{W}^{(b)}\)).

  2. Calculate variance: \(\hat{V}_\tau = \frac{1}{B} \sum_{b = 1}^B (\hat{\tau}^{b} - \frac{1}{B} \sum_{b = 1}^B \hat{\tau}^b)^2\)

Jackknife Variance Estimation

  1. For each \(i\) from \(1 \to N\):
    1. Calculate \(\hat{\tau}^{(-i)}\): \(\arg\min_{\tau, \{\alpha_j, \beta_t\}} \sum_{j \neq, i, t}(\mathbf{Y}_{jt} - \alpha_j - \beta_t - \tau \mathbf{W}_{it})^2 \hat{w}_j \hat{\lambda}_t\)
  2. Calculate: \(\hat{V}_{\tau} = (N - 1) N^{-1} \sum_{i = 1}^N (\hat{\tau}^{(-i)} - \hat{\tau})^2\)

Placebo Variance Estimation

  1. For each \(b\) from \(1 \to B\)
    1. Sample \(N_t\) out of \(N_c\) without replacement to get the “placebo” treatment
    2. Construct a placebo treatment matrix \(\mathbf{W}_c^b\) for the controls
    3. Calculate \(\hat{\tau}\) based on \((\mathbf{Y}_c, \mathbf{W}_c^b)\)
  2. Calculate \(\hat{V}_\tau = \frac{1}{B}\sum_{b = 1}^B (\hat{\tau}^b - \frac{1}{B} \sum_{b = 1}^B \hat{\tau}^b)^2\)

25.2.1 Block Treatment

Code provided by the synthdid package

library(synthdid)
library(tidyverse)

# Estimate the effect of California Proposition 99 on cigarette consumption
data('california_prop99')

setup = synthdid::panel.matrices(synthdid::california_prop99)

tau.hat = synthdid::synthdid_estimate(setup$Y, setup$N0, setup$T0)

# se = sqrt(vcov(tau.hat, method = 'placebo'))

plot(tau.hat) + causalverse::ama_theme()

setup = synthdid::panel.matrices(synthdid::california_prop99)

# Run for specific estimators
results_selected = causalverse::panel_estimate(setup,
                                               selected_estimators = c("synthdid", "did", "sc"))

results_selected
#> $synthdid
#> $synthdid$estimate
#> synthdid: -15.604 +- NA. Effective N0/N0 = 16.4/38~0.4. Effective T0/T0 = 2.8/19~0.1. N1,T1 = 1,12. 
#> 
#> $synthdid$std.error
#> [1] 10.05324
#> 
#> 
#> $did
#> $did$estimate
#> synthdid: -27.349 +- NA. Effective N0/N0 = 38.0/38~1.0. Effective T0/T0 = 19.0/19~1.0. N1,T1 = 1,12. 
#> 
#> $did$std.error
#> [1] 15.81479
#> 
#> 
#> $sc
#> $sc$estimate
#> synthdid: -19.620 +- NA. Effective N0/N0 = 3.8/38~0.1. Effective T0/T0 = Inf/19~Inf. N1,T1 = 1,12. 
#> 
#> $sc$std.error
#> [1] 11.16422

# to access more details in the estimate object
summary(results_selected$did$estimate)
#> $estimate
#> [1] -27.34911
#> 
#> $se
#>      [,1]
#> [1,]   NA
#> 
#> $controls
#>                estimate 1
#> Wyoming             0.026
#> Wisconsin           0.026
#> West Virginia       0.026
#> Virginia            0.026
#> Vermont             0.026
#> Utah                0.026
#> Texas               0.026
#> Tennessee           0.026
#> South Dakota        0.026
#> South Carolina      0.026
#> Rhode Island        0.026
#> Pennsylvania        0.026
#> Oklahoma            0.026
#> Ohio                0.026
#> North Dakota        0.026
#> North Carolina      0.026
#> New Mexico          0.026
#> New Hampshire       0.026
#> Nevada              0.026
#> Nebraska            0.026
#> Montana             0.026
#> Missouri            0.026
#> Mississippi         0.026
#> Minnesota           0.026
#> Maine               0.026
#> Louisiana           0.026
#> Kentucky            0.026
#> Kansas              0.026
#> Iowa                0.026
#> Indiana             0.026
#> Illinois            0.026
#> Idaho               0.026
#> Georgia             0.026
#> Delaware            0.026
#> Connecticut         0.026
#> 
#> $periods
#>      estimate 1
#> 1988      0.053
#> 1987      0.053
#> 1986      0.053
#> 1985      0.053
#> 1984      0.053
#> 1983      0.053
#> 1982      0.053
#> 1981      0.053
#> 1980      0.053
#> 1979      0.053
#> 1978      0.053
#> 1977      0.053
#> 1976      0.053
#> 1975      0.053
#> 1974      0.053
#> 1973      0.053
#> 1972      0.053
#> 1971      0.053
#> 
#> $dimensions
#>           N1           N0 N0.effective           T1           T0 T0.effective 
#>            1           38           38           12           19           19

causalverse::process_panel_estimate(results_selected)
#>     Method Estimate    SE
#> 1 SYNTHDID   -15.60 10.05
#> 2      DID   -27.35 15.81
#> 3       SC   -19.62 11.16

25.2.2 Staggered Adoption

To apply to staggered adoption settings using the SDID estimator (see examples in Arkhangelsky et al. (2021), p. 4115 similar to Ben-Michael, Feller, and Rothstein (2022)), we can:

  1. Apply the SDID estimator repeatedly, once for every adoption date.

  2. Using Ben-Michael, Feller, and Rothstein (2022) ’s method, form matrices for each adoption date. Apply SDID and average based on treated unit/time-period fractions.

  3. Create multiple samples by splitting the data up by time periods. Each sample should have a consistent adoption date.

For a formal note on this special case, see Porreca (2022). It compares the outcomes from using SynthDiD with those from other estimators:

  • Two-Way Fixed Effects (TWFE),

  • The group time average treatment effect estimator from Callaway and Sant’Anna (2021),

  • The partially pooled synthetic control method estimator from Ben-Michael, Feller, and Rothstein (2021), in a staggered treatment adoption context.

  • The findings reveal that SynthDiD produces a different estimate of the average treatment effect compared to the other methods.

    • Simulation results suggest that these differences could be due to the SynthDiD’s data generating process assumption (a latent factor model) aligning more closely with the actual data than the additive fixed effects model assumed by traditional DiD methods.

To explore heterogeneity of treatment effect, we can do subgroup analysis (Berman and Israeli 2022, 1092)

Method Advantages Disadvantages Procedure
Split Data into Subsets Compares treated units to control units within the same subgroup. Each subset uses a different synthetic control, making it challenging to compare effects across subgroups.
  1. Split the data into separate subsets for each subgroup.
  2. Compute synthetic DID effects for each subset.
Control Group Comprising All Non-adopters Control weights match pretrends well for each treated subgroup. Each control unit receives a different weight for each treatment subgroup, making it difficult to compare results due to varying synthetic controls.
  1. Use a control group consisting of all non-adopters in each balanced panel cohort analysis.
  2. Switch treatment units to the subgroup being analyzed.
  3. Perform synthdid analysis.
Use All Data to Estimate Synthetic Control Weights (recommend) All units have the same synthetic control. Pretrend match may not be as accurate since it aims to match the average outcome of all treated units, not just a specific subgroup.
  1. Use all the data to estimate the synthetic DID control weights.
  2. Compute treatment effects using only the treated subgroup units as the treatment units.
library(tidyverse)
df <- fixest::base_stagg |>
   dplyr::mutate(treatvar = if_else(time_to_treatment >= 0, 1, 0)) |>
   dplyr::mutate(treatvar = as.integer(if_else(year_treated > (5 + 2), 0, treatvar)))


est <- causalverse::synthdid_est_ate(
  data               = df,
  adoption_cohorts   = 5:7,
  lags               = 2,
  leads              = 2,
  time_var           = "year",
  unit_id_var        = "id",
  treated_period_var = "year_treated",
  treat_stat_var     = "treatvar",
  outcome_var        = "y"
)
#> adoption_cohort: 5 
#> Treated units: 5 Control units: 65 
#> adoption_cohort: 6 
#> Treated units: 5 Control units: 60 
#> adoption_cohort: 7 
#> Treated units: 5 Control units: 55

data.frame(
    Period = names(est$TE_mean_w),
    ATE    = est$TE_mean_w,
    SE     = est$SE_mean_w
) |>
    causalverse::nice_tab()
#>    Period   ATE   SE
#> 1      -2 -0.05 0.22
#> 2      -1  0.05 0.22
#> 3       0 -5.07 0.80
#> 4       1 -4.68 0.51
#> 5       2 -3.70 0.79
#> 6 cumul.0 -5.07 0.80
#> 7 cumul.1 -4.87 0.55
#> 8 cumul.2 -4.48 0.53


causalverse::synthdid_plot_ate(est)

est_sub <- causalverse::synthdid_est_ate(
  data               = df,
  adoption_cohorts   = 5:7,
  lags               = 2,
  leads              = 2,
  time_var           = "year",
  unit_id_var        = "id",
  treated_period_var = "year_treated",
  treat_stat_var     = "treatvar",
  outcome_var        = "y",
  # a vector of subgroup id (from unit id)
  subgroup           =  c(
    # some are treated
    "11", "30", "49" ,
    # some are control within this period
    "20", "25", "21")
)
#> adoption_cohort: 5 
#> Treated units: 3 Control units: 65 
#> adoption_cohort: 6 
#> Treated units: 0 Control units: 60 
#> adoption_cohort: 7 
#> Treated units: 0 Control units: 55

data.frame(
    Period = names(est_sub$TE_mean_w),
    ATE = est_sub$TE_mean_w,
    SE = est_sub$SE_mean_w
) |>
    causalverse::nice_tab()
#>    Period   ATE   SE
#> 1      -2  0.32 0.44
#> 2      -1 -0.32 0.44
#> 3       0 -4.29 1.68
#> 4       1 -4.00 1.52
#> 5       2 -3.44 2.90
#> 6 cumul.0 -4.29 1.68
#> 7 cumul.1 -4.14 1.52
#> 8 cumul.2 -3.91 1.82

causalverse::synthdid_plot_ate(est)

Plot different estimators

library(causalverse)
methods <- c("synthdid", "did", "sc", "sc_ridge", "difp", "difp_ridge")

estimates <- lapply(methods, function(method) {
  synthdid_est_ate(
    data               = df,
    adoption_cohorts   = 5:7,
    lags               = 2,
    leads              = 2,
    time_var           = "year",
    unit_id_var        = "id",
    treated_period_var = "year_treated",
    treat_stat_var     = "treatvar",
    outcome_var        = "y",
    method = method
  )
})

plots <- lapply(seq_along(estimates), function(i) {
  causalverse::synthdid_plot_ate(estimates[[i]],
                                 title = methods[i],
                                 theme = causalverse::ama_theme(base_size = 6))
})

gridExtra::grid.arrange(grobs = plots, ncol = 2)

References

Arkhangelsky, Dmitry, Susan Athey, David A Hirshberg, Guido W Imbens, and Stefan Wager. 2019. “Synthetic Difference in Differences.” National Bureau of Economic Research.
———. 2021. “Synthetic Difference-in-Differences.” American Economic Review 111 (12): 4088–118.
———. 2021. “The Augmented Synthetic Control Method.” Journal of the American Statistical Association 116 (536): 1789–1803.
———. 2022. “Synthetic Controls with Staggered Adoption.” Journal of the Royal Statistical Society Series B: Statistical Methodology 84 (2): 351–81.
Berman, Ron, and Ayelet Israeli. 2022. “The Value of Descriptive Analytics: Evidence from Online Retailers.” Marketing Science 41 (6): 1074–96.
Callaway, Brantly, and Pedro HC Sant’Anna. 2021. “Difference-in-Differences with Multiple Time Periods.” Journal of Econometrics 225 (2): 200–230.
Conley, Timothy G, and Christopher R Taber. 2011. “Inference with ‘Difference in Differences’ with a Small Number of Policy Changes.” The Review of Economics and Statistics 93 (1): 113–25.
Efron, Bradley. 1992. “Bootstrap Methods: Another Look at the Jackknife.” In Breakthroughs in Statistics: Methodology and Distribution, 569–93. Springer.
Miller, Rupert G. 1974. “The Jackknife-a Review.” Biometrika 61 (1): 1–15.
Porreca, Zachary. 2022. “Synthetic Difference-in-Differences Estimation with Staggered Treatment Timing.” Economics Letters 220: 110874.