26.5 Examples
Example by Philipp Leppert replicating Card and Krueger (1994)
Example by Anthony Schmidt
26.5.1 Example by Doleac and Hansen (2020)
The purpose of banning a checking box for ex-criminal was banned because we thought that it gives more access to felons
Even if we ban the box, employers wouldn’t just change their behaviors. But then the unintended consequence is that employers statistically discriminate based on race
3 types of ban the box
- Public employer only
- Private employer with government contract
- All employers
Main identification strategy
- If any county in the Metropolitan Statistical Area (MSA) adopts ban the box, it means the whole MSA is treated. Or if the state adopts “ban the ban,” every county is treated
Under Simple Dif-n-dif
\[ Y_{it} = \beta_0 + \beta_1 Post_t + \beta_2 treat_i + \beta_2 (Post_t \times Treat_i) + \epsilon_{it} \]
But if there is no common post time, then we should use Staggered Dif-n-dif
\[ \begin{aligned} E_{imrt} &= \alpha + \beta_1 BTB_{imt} W_{imt} + \beta_2 BTB_{mt} + \beta_3 BTB_{mt} H_{imt}\\ &+ \delta_m + D_{imt} \beta_5 + \lambda_{rt} + \delta_m\times f(t) \beta_7 + e_{imrt} \end{aligned} \]
where
\(i\) = person; \(m\) = MSA; \(r\) = region (US regions e.g., Midwest) ; \(r\) = region; \(t\) = year
\(W\) = White; \(B\) = Black; \(H\) = Hispanic
\(\beta_1 BTB_{imt} W_{imt} + \beta_2 BTB_{mt} + \beta_3 BTB_{mt} H_{imt}\) are the 3 dif-n-dif variables (\(BTB\) = “ban the box”)
\(\delta_m\) = dummy for MSI
\(D_{imt}\) = control for people
\(\lambda_{rt}\) = region by time fixed effect
\(\delta_m \times f(t)\) = linear time trend within MSA (but we should not need this if we have good pre-trend)
If we put \(\lambda_r - \lambda_t\) (separately) we will more broad fixed effect, while \(\lambda_{rt}\) will give us deeper and narrower fixed effect.
Before running this model, we have to drop all other races. And \(\beta_1, \beta_2, \beta_3\) are not collinear because there are all interaction terms with \(BTB_{mt}\)
If we just want to estimate the model for black men, we will modify it to be
\[ E_{imrt} = \alpha + BTB_{mt} \beta_1 + \delta_m + D_{imt} \beta_5 + \lambda_{rt} + (\delta_m \times f(t)) \beta_7 + e_{imrt} \]
\[ \begin{aligned} E_{imrt} &= \alpha + BTB_{m (t - 3t)} \theta_1 + BTB_{m(t-2)} \theta_2 + BTB_{mt} \theta_4 \\ &+ BTB_{m(t+1)}\theta_5 + BTB_{m(t+2)}\theta_6 + BTB_{m(t+3t)}\theta_7 \\ &+ [\delta_m + D_{imt}\beta_5 + \lambda_r + (\delta_m \times (f(t))\beta_7 + e_{imrt}] \end{aligned} \]
We have to leave \(BTB_{m(t-1)}\theta_3\) out for the category would not be perfect collinearity
So the year before BTB (\(\theta_1, \theta_2, \theta_3\)) should be similar to each other (i.e., same pre-trend). Remember, we only run for places with BTB.
If \(\theta_2\) is statistically different from \(\theta_3\) (baseline), then there could be a problem, but it could also make sense if we have pre-trend announcement.
26.5.2 Example from Princeton
library(foreign)
mydata = read.dta("http://dss.princeton.edu/training/Panel101.dta") %>%
# create a dummy variable to indicate the time when the treatment started
dplyr::mutate(time = ifelse(year >= 1994, 1, 0)) %>%
# create a dummy variable to identify the treatment group
dplyr::mutate(treated = ifelse(country == "E" |
country == "F" | country == "G" ,
1,
0)) %>%
# create an interaction between time and treated
dplyr::mutate(did = time * treated)
estimate the DID estimator
didreg = lm(y ~ treated + time + did, data = mydata)
summary(didreg)
#>
#> Call:
#> lm(formula = y ~ treated + time + did, data = mydata)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -9.768e+09 -1.623e+09 1.167e+08 1.393e+09 6.807e+09
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3.581e+08 7.382e+08 0.485 0.6292
#> treated 1.776e+09 1.128e+09 1.575 0.1200
#> time 2.289e+09 9.530e+08 2.402 0.0191 *
#> did -2.520e+09 1.456e+09 -1.731 0.0882 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.953e+09 on 66 degrees of freedom
#> Multiple R-squared: 0.08273, Adjusted R-squared: 0.04104
#> F-statistic: 1.984 on 3 and 66 DF, p-value: 0.1249
The did
coefficient is the differences-in-differences estimator. Treat has a negative effect
26.5.3 Example by Card and Krueger (1993)
found that increase in minimum wage increases employment
Experimental Setting:
New Jersey (treatment) increased minimum wage
Penn (control) did not increase minimum wage
After | Before | |||
---|---|---|---|---|
Treatment | NJ | A | B | A - B |
Control | PA | C | D | C - D |
A - C | B - D | (A - B) - (C - D) |
where
A - B = treatment effect + effect of time (additive)
C - D = effect of time
(A - B) - (C - D) = dif-n-dif
The identifying assumptions:
Can’t have switchers
PA is the control group
is a good counter factual
is what NJ would look like if they hadn’t had the treatment
\[ Y_{jt} = \beta_0 + NJ_j \beta_1 + POST_t \beta_2 + (NJ_j \times POST_t)\beta_3+ X_{jt}\beta_4 + \epsilon_{jt} \]
where
\(j\) = restaurant
\(NJ\) = dummy where \(1 = NJ\), and \(0 = PA\)
\(POST\) = dummy where \(1 = post\), and \(0 = pre\)
Notes:
We don’t need \(\beta_4\) in our model to have unbiased \(\beta_3\), but including it would give our coefficients efficiency
If we use \(\Delta Y_{jt}\) as the dependent variable, we don’t need \(POST_t \beta_2\) anymore
Alternative model specification is that the authors use NJ high wage restaurant as control group (still choose those that are close to the border)
The reason why they can’t control for everything (PA + NJ high wage) is because it’s hard to interpret the causal treatment
Dif-n-dif utilizes similarity in pretrend of the dependent variables. However, this is neither a necessary nor sufficient for the identifying assumption.
It’s not sufficient because they can have multiple treatments (technically, you could include more control, but your treatment can’t interact)
It’s not necessary because trends can be parallel after treatment
However, we can’t never be certain; we just try to find evidence consistent with our theory so that dif-n-dif can work.
Notice that we don’t need before treatment the levels of the dependent variable to be the same (e.g., same wage average in both NJ and PA), dif-n-dif only needs pre-trend (i.e., slope) to be the same for the two groups.
26.5.4 Example by Butcher, McEwan, and Weerapana (2014)
Theory:
Highest achieving students are usually in hard science. Why?
Hard to give students students the benefit of doubt for hard science
How unpleasant and how easy to get a job. Degrees with lower market value typically want to make you feel more pleasant
Under OLS
\[ E_{ij} = \beta_0 + X_i \beta_1 + G_j \beta_2 + \epsilon_{ij} \]
where
\(X_i\) = student attributes
\(\beta_2\) = causal estimate (from grade change)
\(E_{ij}\) = Did you choose to enroll in major \(j\)
\(G_j\) = grade given in major \(j\)
Examine \(\hat{\beta}_2\)
Negative bias: Endogenous response because department with lower enrollment rate will give better grade
Positive bias: hard science is already having best students (i.e., ability), so if they don’t their grades can be even lower
Under dif-n-dif
\[ Y_{idt} = \beta_0 + POST_t \beta_1 + Treat_d \beta_2 + (POST_t \times Treat_d)\beta_3 + X_{idt} + \epsilon_{idt} \]
where
- \(Y_{idt}\) = grade average
Intercept | Treat | Post | Treat*Post | |
---|---|---|---|---|
Treat Pre | 1 | 1 | 0 | 0 |
Treat Post | 1 | 1 | 1 | 1 |
Control Pre | 1 | 0 | 0 | 0 |
Control Post | 1 | 0 | 1 | 0 |
Average for pre-control \(\beta_0\) |
A more general specification of the dif-n-dif is that
\[ Y_{idt} = \alpha_0 + (POST_t \times Treat_d) \alpha_1 + \theta_d + \delta_t + X_{idt} + u_{idt} \]
where
\((\theta_d + \delta_t)\) richer , more df than \(Treat_d \beta_2 + Post_t \beta_1\) (because fixed effects subsume Post and treat)
\(\alpha_1\) should be equivalent to \(\beta_3\) (if your model assumptions are correct)