26.3 Notes
-
Match treatment and control based on pre-treatment observables
Modify SEs appropriately (James J. Heckman, Ichimura, and Todd 1997). It’s might be easier to just use the Doubly Robust DiD (Sant’Anna and Zhao 2020) where you just need either matching or regression to work in order to identify your treatment effect
Whereas the group fixed effects control for the group time-invariant effects, it does not control for selection bias (i.e., certain groups are more likely to be treated than others). Hence, with these backdoor open (i.e., selection bias) between (1) propensity to be treated and (2) dynamics evolution of the outcome post-treatment, matching can potential close these backdoor.
Be careful when matching time-varying covariates because you might encounter “regression to the mean” problem, where pre-treatment periods can have an unusually bad or good time (that is out of the ordinary), then the post-treatment period outcome can just be an artifact of the regression to the mean (Daw and Hatfield 2018). This problem is not of concern to time-invariant variables.
Matching and DiD can use pre-treatment outcomes to correct for selection bias. From real world data and simulation, (Chabé-Ferret 2015) found that matching generally underestimates the average causal effect and gets closer to the true effect with more number of pre-treatment outcomes. When selection bias is symmetric around the treatment date, DID is still consistent when implemented symmetrically (i.e., the same number of period before and after treatment). In cases where selection bias is asymmetric, the MC simulations show that Symmetric DiD still performs better than Matching.
It’s always good to show results with and without controls because
If the controls are fixed within group or within time, then those should be absorbed under those fixed effects
If the controls are dynamic across group and across, then your parallel trends assumption is not plausible.
Under causal inference, \(R^2\) is not so important.
For count data, one can use the fixed-effects Poisson pseudo-maximum likelihood estimator (PPML) Puhani (2012) (For applied papers, see Burtch, Carnahan, and Greenwood (2018) in management and C. He et al. (2021) in marketing). This also allows for robust standard errors under over-dispersion (Wooldridge 1999).
This estimator outperforms a log OLS when data have many 0s(Silva and Tenreyro 2011), since log-OLS can produce biased estimates (O’Hara and Kotze 2010) under heteroskedascity (Silva and Tenreyro 2006).
For those thinking of negative binomial with fixed effects, there isn’t an estimator right now (Allison and Waterman 2002).
For [Zero-valued Outcomes], we have to distinguish the treatment effect on the intensive (outcome: 10 to 11) vs. extensive margins (outcome: 0 to 1), and we can’t readily interpret the treatment coefficient of log-transformed outcome regression as percentage change (J. Chen and Roth 2023). Alternatively, we can either focus on
Proportional treatment effects: \(\theta_{ATT\%} = \frac{E(Y_{it}(1) | D_i = 1, Post_t = 1) - E(Y_{it}(0) |D_i = 1, Post_t = 1)}{E(Y_{it}(0) | D_i = 1 , Post_t = 1}\) (i.e., percentage change in treated group’s average post-treatment outcome). Instead of relying on the parallel trends assumption in levels, we could also rely on parallel trends assumption in ratio (Wooldridge 2023).
We can use Poisson QMLE to estimate the treatment effect: \(Y_{it} = \exp(\beta_0 + D_i \times \beta_1 Post_t + \beta_2 D_i + \beta_3 Post_t + X_{it}) \epsilon_{it}\) and \(\hat{\theta}_{ATT \%} = \exp(\hat{\beta}_1-1)\).
To examine the parallel trends assumption in ratio holds, we can also estimate a dynamic version of the Poisson QMLE: \(Y_{it} = \exp(\lambda_t + \beta_2 D_i + \sum_{r \neq -1} \beta_r D_i \times (RelativeTime_t = r)\), we would expect \(\exp(\hat{\beta_r}) - 1 = 0\) for \(r < 0\).
Even if we see the plot of these coefficients are 0, we still should run sensitivity analysis (Rambachan and Roth 2023) to examine violation of this assumption (see Prior Parallel Trends Test).
Log Effects with Calibrated Extensive-margin value: due to problem with the mean value interpretation of the proportional treatment effects with outcomes that are heavy-tailed, we might be interested in the extensive margin effect. Then, we can explicit model how much weight we put on the intensive vs. extensive margin (J. Chen and Roth 2023, 39).
- Proportional treatment effects
set.seed(123) # For reproducibility
n <- 500 # Number of observations per group (treated and control)
# Generating IDs for a panel setup
ID <- rep(1:n, times = 2)
# Defining groups and periods
Group <- rep(c("Control", "Treated"), each = n)
Time <- rep(c("Before", "After"), times = n)
Treatment <- ifelse(Group == "Treated", 1, 0)
Post <- ifelse(Time == "After", 1, 0)
# Step 1: Generate baseline outcomes with a zero-inflated model
lambda <- 20 # Average rate of occurrence
zero_inflation <- 0.5 # Proportion of zeros
Y_baseline <-
ifelse(runif(2 * n) < zero_inflation, 0, rpois(2 * n, lambda))
# Step 2: Apply DiD treatment effect on the treated group in the post-treatment period
Treatment_Effect <- Treatment * Post
Y_treatment <-
ifelse(Treatment_Effect == 1, rpois(n, lambda = 2), 0)
# Incorporating a simple time trend, ensuring outcomes are non-negative
Time_Trend <- ifelse(Time == "After", rpois(2 * n, lambda = 1), 0)
# Step 3: Combine to get the observed outcomes
Y_observed <- Y_baseline + Y_treatment + Time_Trend
# Ensure no negative outcomes after the time trend
Y_observed <- ifelse(Y_observed < 0, 0, Y_observed)
# Create the final dataset
data <-
data.frame(
ID = ID,
Treatment = Treatment,
Period = Post,
Outcome = Y_observed
)
# Viewing the first few rows of the dataset
head(data)
#> ID Treatment Period Outcome
#> 1 1 0 0 0
#> 2 2 0 1 25
#> 3 3 0 0 0
#> 4 4 0 1 20
#> 5 5 0 0 19
#> 6 6 0 1 0
library(fixest)
res_pois <-
fepois(Outcome ~ Treatment + Period + Treatment * Period,
data = data,
vcov = "hetero")
etable(res_pois)
#> res_pois
#> Dependent Var.: Outcome
#>
#> Constant 2.249*** (0.0717)
#> Treatment 0.1743. (0.0932)
#> Period 0.0662 (0.0960)
#> Treatment x Period 0.0314 (0.1249)
#> __________________ _________________
#> S.E. type Heteroskeda.-rob.
#> Observations 1,000
#> Squared Cor. 0.01148
#> Pseudo R2 0.00746
#> BIC 15,636.8
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Average percentage change
exp(coefficients(res_pois)["Treatment:Period"]) - 1
#> Treatment:Period
#> 0.03191643
# SE using delta method
exp(coefficients(res_pois)["Treatment:Period"]) *
sqrt(res_pois$cov.scaled["Treatment:Period", "Treatment:Period"])
#> Treatment:Period
#> 0.1288596
In this example, the DID coefficient is not significant. However, say that it’s significant, we can interpret the coefficient as 3 percent increase in posttreatment period due to the treatment.
library(fixest)
base_did_log0 <- base_did |>
mutate(y = if_else(y > 0, y, 0))
res_pois_es <-
fepois(y ~ x1 + i(period, treat, 5) | id + period,
data = base_did_log0,
vcov = "hetero")
etable(res_pois_es)
#> res_pois_es
#> Dependent Var.: y
#>
#> x1 0.1895*** (0.0108)
#> treat x period = 1 -0.2769 (0.3545)
#> treat x period = 2 -0.2699 (0.3533)
#> treat x period = 3 0.1737 (0.3520)
#> treat x period = 4 -0.2381 (0.3249)
#> treat x period = 6 0.3724 (0.3086)
#> treat x period = 7 0.7739* (0.3117)
#> treat x period = 8 0.5028. (0.2962)
#> treat x period = 9 0.9746** (0.3092)
#> treat x period = 10 1.310*** (0.3193)
#> Fixed-Effects: ------------------
#> id Yes
#> period Yes
#> ___________________ __________________
#> S.E. type Heteroskedas.-rob.
#> Observations 1,080
#> Squared Cor. 0.51131
#> Pseudo R2 0.34836
#> BIC 5,868.8
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
iplot(res_pois_es)
This parallel trend is the “ratio” version as in Wooldridge (2023) :
\[ \frac{E(Y_{it}(0) |D_i = 1, Post_t = 1)}{E(Y_{it}(0) |D_i = 1, Post_t = 0)} = \frac{E(Y_{it}(0) |D_i = 0, Post_t = 1)}{E(Y_{it}(0) |D_i =0, Post_t = 0)} \]
which means without treatment, the average percentage change in the mean outcome for treated group is identical to that of the control group.
- Log Effects with Calibrated Extensive-margin value
If we want to study the treatment effect on a concave transformation of the outcome that is less influenced by those in the distribution’s tail, then we can perform this analysis.
Steps:
- Normalize the outcomes such that 1 represents the minimum non-zero and positve value (i.e., divide the outcome by its minimum non-zero and positive value).
- Estimate the treatment effects for the new outcome
\[ m(y) = \begin{cases} \log(y) & \text{for } y >0 \\ -x & \text{for } y = 0 \end{cases} \]
The choice of \(x\) depends on what the researcher is interested in:
Value of \(x\) | Interest |
---|---|
\(x = 0\) | The treatment effect in logs where all zero-valued outcomes are set to equal the minimum non-zero value (i.e., we exclude the extensive-margin change between 0 and \(y_{min}\) ) |
\(x>0\) | Setting the change between 0 and \(y_{min}\) to be valued as the equivalent of a \(x\) log point change along the intensive margin. |
library(fixest)
base_did_log0_cali <- base_did_log0 |>
# get min
mutate(min_y = min(y[y > 0])) |>
# normalized the outcome
mutate(y_norm = y / min_y)
my_regression <-
function(x) {
base_did_log0_cali <-
base_did_log0_cali %>% mutate(my = ifelse(y_norm == 0,-x,
log(y_norm)))
my_reg <-
feols(
fml = my ~ x1 + i(period, treat, 5) | id + period,
data = base_did_log0_cali,
vcov = "hetero"
)
return(my_reg)
}
xvec <- c(0, .1, .5, 1, 3)
reg_list <- purrr::map(.x = xvec, .f = my_regression)
iplot(reg_list,
pt.col = 1:length(xvec),
pt.pch = 1:length(xvec))
legend("topleft",
col = 1:length(xvec),
pch = 1:length(xvec),
legend = as.character(xvec))
etable(
reg_list,
headers = list("Extensive-margin value (x)" = as.character(xvec)),
digits = 2,
digits.stats = 2
)
#> model 1 model 2 model 3
#> Extensive-margin value (x) 0 0.1 0.5
#> Dependent Var.: my my my
#>
#> x1 0.43*** (0.02) 0.44*** (0.02) 0.46*** (0.03)
#> treat x period = 1 -0.92 (0.67) -0.94 (0.69) -1.0 (0.73)
#> treat x period = 2 -0.41 (0.66) -0.42 (0.67) -0.43 (0.71)
#> treat x period = 3 -0.34 (0.67) -0.35 (0.68) -0.38 (0.73)
#> treat x period = 4 -1.0 (0.67) -1.0 (0.68) -1.1 (0.73)
#> treat x period = 6 0.44 (0.66) 0.44 (0.67) 0.45 (0.72)
#> treat x period = 7 1.1. (0.64) 1.1. (0.65) 1.2. (0.70)
#> treat x period = 8 1.1. (0.64) 1.1. (0.65) 1.1 (0.69)
#> treat x period = 9 1.7** (0.65) 1.7** (0.66) 1.8* (0.70)
#> treat x period = 10 2.4*** (0.62) 2.4*** (0.63) 2.5*** (0.68)
#> Fixed-Effects: -------------- -------------- --------------
#> id Yes Yes Yes
#> period Yes Yes Yes
#> __________________________ ______________ ______________ ______________
#> S.E. type Heterosk.-rob. Heterosk.-rob. Heterosk.-rob.
#> Observations 1,080 1,080 1,080
#> R2 0.43 0.43 0.43
#> Within R2 0.26 0.26 0.25
#>
#> model 4 model 5
#> Extensive-margin value (x) 1 3
#> Dependent Var.: my my
#>
#> x1 0.49*** (0.03) 0.62*** (0.04)
#> treat x period = 1 -1.1 (0.79) -1.5 (1.0)
#> treat x period = 2 -0.44 (0.77) -0.51 (0.99)
#> treat x period = 3 -0.43 (0.78) -0.60 (1.0)
#> treat x period = 4 -1.2 (0.78) -1.5 (1.0)
#> treat x period = 6 0.45 (0.77) 0.46 (1.0)
#> treat x period = 7 1.2 (0.75) 1.3 (0.97)
#> treat x period = 8 1.2 (0.74) 1.3 (0.96)
#> treat x period = 9 1.8* (0.75) 2.1* (0.97)
#> treat x period = 10 2.7*** (0.73) 3.2*** (0.94)
#> Fixed-Effects: -------------- --------------
#> id Yes Yes
#> period Yes Yes
#> __________________________ ______________ ______________
#> S.E. type Heterosk.-rob. Heterosk.-rob.
#> Observations 1,080 1,080
#> R2 0.42 0.41
#> Within R2 0.25 0.24
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We have the dynamic treatment effects for different hypothesized extensive-margin value of \(x \in (0, .1, .5, 1, 3, 5)\)
The first column is when the zero-valued outcome equal to \(y_{min, y>0}\) (i.e., there is no different between the minimum outcome and zero outcome - \(x = 0\))
For this particular example, as the extensive margin increases, we see an increase in the effect magnitude. The second column is when we assume an extensive-margin change from 0 to \(y_{min, y >0}\) is equivalent to a 10 (i.e., \(0.1 \times 100\)) log point change along the intensive margin.