29.5 Examples
Example by Philipp Leppert replicating Card and Krueger (1994)
Example by Anthony Schmidt
29.5.1 Example by Doleac and Hansen (2020)
The purpose of banning a checking box for ex-criminal was banned because we thought that it gives more access to felons
Even if we ban the box, employers wouldn’t just change their behaviors. But then the unintended consequence is that employers statistically discriminate based on race
3 types of ban the box
- Public employer only
- Private employer with government contract
- All employers
Main identification strategy
- If any county in the Metropolitan Statistical Area (MSA) adopts ban the box, it means the whole MSA is treated. Or if the state adopts “ban the ban,” every county is treated
Under Simple Dif-n-dif
Yit=β0+β1Postt+β2treati+β2(Postt×Treati)+ϵit
But if there is no common post time, then we should use Staggered Dif-n-dif
Eimrt=α+β1BTBimtWimt+β2BTBmt+β3BTBmtHimt+δm+Dimtβ5+λrt+δm×f(t)β7+eimrt
where
i = person; m = MSA; r = region (US regions e.g., Midwest) ; r = region; t = year
W = White; B = Black; H = Hispanic
β1BTBimtWimt+β2BTBmt+β3BTBmtHimt are the 3 dif-n-dif variables (BTB = “ban the box”)
δm = dummy for MSI
Dimt = control for people
λrt = region by time fixed effect
δm×f(t) = linear time trend within MSA (but we should not need this if we have good pre-trend)
If we put λr−λt (separately) we will more broad fixed effect, while λrt will give us deeper and narrower fixed effect.
Before running this model, we have to drop all other races. And β1,β2,β3 are not collinear because there are all interaction terms with BTBmt
If we just want to estimate the model for black men, we will modify it to be
Eimrt=α+BTBmtβ1+δm+Dimtβ5+λrt+(δm×f(t))β7+eimrt
Eimrt=α+BTBm(t−3t)θ1+BTBm(t−2)θ2+BTBmtθ4+BTBm(t+1)θ5+BTBm(t+2)θ6+BTBm(t+3t)θ7+[δm+Dimtβ5+λr+(δm×(f(t))β7+eimrt]
We have to leave BTBm(t−1)θ3 out for the category would not be perfect collinearity
So the year before BTB (θ1,θ2,θ3) should be similar to each other (i.e., same pre-trend). Remember, we only run for places with BTB.
If θ2 is statistically different from θ3 (baseline), then there could be a problem, but it could also make sense if we have pre-trend announcement.
29.5.2 Example from Princeton
library(foreign)
mydata = read.dta("http://dss.princeton.edu/training/Panel101.dta") %>%
# create a dummy variable to indicate the time when the treatment started
dplyr::mutate(time = ifelse(year >= 1994, 1, 0)) %>%
# create a dummy variable to identify the treatment group
dplyr::mutate(treated = ifelse(country == "E" |
country == "F" | country == "G" ,
1,
0)) %>%
# create an interaction between time and treated
dplyr::mutate(did = time * treated)
estimate the DID estimator
didreg = lm(y ~ treated + time + did, data = mydata)
summary(didreg)
#>
#> Call:
#> lm(formula = y ~ treated + time + did, data = mydata)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -9.768e+09 -1.623e+09 1.167e+08 1.393e+09 6.807e+09
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 3.581e+08 7.382e+08 0.485 0.6292
#> treated 1.776e+09 1.128e+09 1.575 0.1200
#> time 2.289e+09 9.530e+08 2.402 0.0191 *
#> did -2.520e+09 1.456e+09 -1.731 0.0882 .
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 2.953e+09 on 66 degrees of freedom
#> Multiple R-squared: 0.08273, Adjusted R-squared: 0.04104
#> F-statistic: 1.984 on 3 and 66 DF, p-value: 0.1249
The did
coefficient is the differences-in-differences estimator. Treat has a negative effect
29.5.3 Example by Card and Krueger (1993)
found that increase in minimum wage increases employment
Experimental Setting:
New Jersey (treatment) increased minimum wage
Penn (control) did not increase minimum wage
After | Before | |||
---|---|---|---|---|
Treatment | NJ | A | B | A - B |
Control | PA | C | D | C - D |
A - C | B - D | (A - B) - (C - D) |
where
A - B = treatment effect + effect of time (additive)
C - D = effect of time
(A - B) - (C - D) = dif-n-dif
The identifying assumptions:
Can’t have switchers
PA is the control group
is a good counter factual
is what NJ would look like if they hadn’t had the treatment
Yjt=β0+NJjβ1+POSTtβ2+(NJj×POSTt)β3+Xjtβ4+ϵjt
where
j = restaurant
NJ = dummy where 1=NJ, and 0=PA
POST = dummy where 1=post, and 0=pre
Notes:
We don’t need β4 in our model to have unbiased β3, but including it would give our coefficients efficiency
If we use ΔYjt as the dependent variable, we don’t need POSTtβ2 anymore
Alternative model specification is that the authors use NJ high wage restaurant as control group (still choose those that are close to the border)
The reason why they can’t control for everything (PA + NJ high wage) is because it’s hard to interpret the causal treatment
Dif-n-dif utilizes similarity in pretrend of the dependent variables. However, this is neither a necessary nor sufficient for the identifying assumption.
It’s not sufficient because they can have multiple treatments (technically, you could include more control, but your treatment can’t interact)
It’s not necessary because trends can be parallel after treatment
However, we can’t never be certain; we just try to find evidence consistent with our theory so that dif-n-dif can work.
Notice that we don’t need before treatment the levels of the dependent variable to be the same (e.g., same wage average in both NJ and PA), dif-n-dif only needs pre-trend (i.e., slope) to be the same for the two groups.
29.5.4 Example by Butcher, McEwan, and Weerapana (2014)
Theory:
Highest achieving students are usually in hard science. Why?
Hard to give students students the benefit of doubt for hard science
How unpleasant and how easy to get a job. Degrees with lower market value typically want to make you feel more pleasant
Under OLS
Eij=β0+Xiβ1+Gjβ2+ϵij
where
Xi = student attributes
β2 = causal estimate (from grade change)
Eij = Did you choose to enroll in major j
Gj = grade given in major j
Examine ˆβ2
Negative bias: Endogenous response because department with lower enrollment rate will give better grade
Positive bias: hard science is already having best students (i.e., ability), so if they don’t their grades can be even lower
Under dif-n-dif
Yidt=β0+POSTtβ1+Treatdβ2+(POSTt×Treatd)β3+Xidt+ϵidt
where
- Yidt = grade average
Intercept | Treat | Post | Treat*Post | |
---|---|---|---|---|
Treat Pre | 1 | 1 | 0 | 0 |
Treat Post | 1 | 1 | 1 | 1 |
Control Pre | 1 | 0 | 0 | 0 |
Control Post | 1 | 0 | 1 | 0 |
Average for pre-control β0 |
A more general specification of the dif-n-dif is that
Yidt=α0+(POSTt×Treatd)α1+θd+δt+Xidt+uidt
where
(θd+δt) richer , more df than Treatdβ2+Posttβ1 (because fixed effects subsume Post and treat)
α1 should be equivalent to β3 (if your model assumptions are correct)