32.11 Generalized Synthetic Control
The Generalized Synthetic Control (GSC) Method extends the synthetic control approach to accommodate multiple treated units and heterogeneous treatment effects while relaxing the parallel trends assumption required in difference-in-differences. Originally developed by Xu (2017), the GSC method integrates interactive fixed effects models, improving efficiency and robustness in time-series cross-sectional (TSCS) data.
32.11.1 The Problem with Traditional Methods
Traditional causal inference methods such as DID require the parallel trends assumption: E[Yit(0)|Di=1]−E[Yit(0)|Di=0]=constant which states that in the absence of treatment, the difference in outcomes between treated and control units would have remained constant over time. However, this assumption often fails due to:
- Time-varying unobserved confounders affecting both treatment assignment and outcomes.
- Heterogeneous treatment effects across units and over time.
- Multiple treatment periods where different units adopt the treatment at different times.
To address these limitations, GSC builds on the interactive fixed effects model, which allows for unit-specific and time-specific latent factors that can capture unobserved confounding trends.
32.11.2 Generalized Synthetic Control Model
Let Yit represent the observed outcome of unit i at time t, and define the potential outcomes framework: Yit(d)=μit+δitd+εit,d∈{0,1} where:
μit represents the latent factor structure of untreated outcomes.
δit is the treatment effect.
εit is the idiosyncratic error term.
Under the interactive fixed effects model, we assume that the untreated outcome follows: μit=Xitβ+λ′ift where:
Xit is a vector of observed covariates.
β is a vector of unknown coefficients.
λi represents unit-specific factor loadings.
ft represents time-specific factors.
The presence of λ′ift allows GSC to control for unobserved confounders that vary across time and units, a key advantage over DID and traditional SCM.
32.11.3 Identification and Estimation
To estimate the Average Treatment Effect on the Treated, we define: ATTt=1NT∑i∈T[Yit(1)−Yit(0)] where NT is the number of treated units. The challenge is that Yit(0) for treated units is counterfactual and must be estimated.
Step 1: Estimating Factor Loadings and Latent Factors
Using only control units, we estimate the latent factors and factor loadings: Yit=Xitβ+λ′ift+εit,i∈C which can be rewritten in matrix form: YC=XCβ+ΛCF′+EC. The key assumption is that factor loadings and latent factors apply to both treated and control units, ensuring valid counterfactual estimation.
Step 2: Imputing Counterfactual Outcomes
For treated units, we estimate: ˆλi=(F′0F0)−1F′0(Yi,0−Xi,0β) where F0 and Yi,0 denote pre-treatment data. The imputed counterfactuals are then:
ˆYit(0)=Xitβ+ˆλ′iˆft.
32.11.4 Bootstrap Procedure for Standard Errors
A key issue in statistical inference with GSC is the estimation of uncertainty. The standard nonparametric bootstrap is biased due to dependent structures in panel data we adopt the parametric bootstrap from (K. T. Li and Sonnier 2023) to correct for bias.
Corrected Bootstrap Algorithm:
- Estimate the IFE Model using control units.
- Resample residuals ˆεit from the fitted model.
- Generate new synthetic datasets using: Y∗it=Xitˆβ+ˆλ′iˆft+ε∗it
- Re-estimate the model on resampled data and compute bootstrap confidence intervals.
This approach ensures correct coverage probabilities and avoids bias in standard error estimation.
# Load required package
library(gsynth)
# Example data
data("gsynth")
# Fit Generalized Synthetic Control Model
gsc_model <-
gsynth(
Y ~ D + X1 + X2,
data = simdata,
parallel = FALSE,
index = c("id", "time"),
force = "two-way",
CV = TRUE,
r = c(0, 5),
se = T
)
#> Cross-validating ...
#> r = 0; sigma2 = 1.84865; IC = 1.02023; PC = 1.74458; MSPE = 2.37280
#> r = 1; sigma2 = 1.51541; IC = 1.20588; PC = 1.99818; MSPE = 1.71743
#> r = 2; sigma2 = 0.99737; IC = 1.16130; PC = 1.69046; MSPE = 1.14540*
#> r = 3; sigma2 = 0.94664; IC = 1.47216; PC = 1.96215; MSPE = 1.15032
#> r = 4; sigma2 = 0.89411; IC = 1.76745; PC = 2.19241; MSPE = 1.21397
#> r = 5; sigma2 = 0.85060; IC = 2.05928; PC = 2.40964; MSPE = 1.23876
#>
#> r* = 2
#>
#>
Bootstrapping ...
#> ..
# Summary of results
summary(gsc_model)
#> Length Class Mode
#> Y.dat 1500 -none- numeric
#> Y 1 -none- character
#> D 1 -none- character
#> X 2 -none- character
#> W 0 -none- NULL
#> index 2 -none- character
#> id 50 -none- numeric
#> time 30 -none- numeric
#> obs.missing 1500 -none- numeric
#> id.tr 5 -none- numeric
#> id.co 45 -none- numeric
#> D.tr 150 -none- numeric
#> I.tr 150 -none- numeric
#> Y.tr 150 -none- numeric
#> Y.ct 150 -none- numeric
#> Y.co 1350 -none- numeric
#> eff 150 -none- numeric
#> Y.bar 90 -none- numeric
#> att 30 -none- numeric
#> att.avg 1 -none- numeric
#> force 1 -none- numeric
#> sameT0 1 -none- logical
#> T 1 -none- numeric
#> N 1 -none- numeric
#> p 1 -none- numeric
#> Ntr 1 -none- numeric
#> Nco 1 -none- numeric
#> T0 5 -none- numeric
#> tr 50 -none- logical
#> pre 150 -none- logical
#> post 150 -none- logical
#> r.cv 1 -none- numeric
#> IC 1 -none- numeric
#> PC 1 -none- numeric
#> beta 2 -none- numeric
#> est.co 13 -none- list
#> mu 1 -none- numeric
#> validX 1 -none- numeric
#> sigma2 1 -none- numeric
#> res.co 1350 -none- numeric
#> MSPE 1 -none- numeric
#> CV.out 30 -none- numeric
#> niter 1 -none- numeric
#> factor 60 -none- numeric
#> lambda.co 90 -none- numeric
#> lambda.tr 10 -none- numeric
#> wgt.implied 225 -none- numeric
#> alpha.tr 5 -none- numeric
#> alpha.co 45 -none- numeric
#> xi 30 -none- numeric
#> inference 1 -none- character
#> est.att 180 -none- numeric
#> est.avg 5 -none- numeric
#> att.avg.boot 200 -none- numeric
#> att.boot 6000 -none- numeric
#> eff.boot 30000 -none- numeric
#> Dtr.boot 30000 -none- numeric
#> Itr.boot 30000 -none- numeric
#> beta.boot 400 -none- numeric
#> est.beta 10 -none- numeric
#> call 9 -none- call
#> formula 3 formula call
# Visualization
plot(gsc_model)