26.5 Examples

Example by Philipp Leppert replicating Card and Krueger (1994)

Example by Anthony Schmidt

26.5.1 Example by Doleac and Hansen (2020)

  • The purpose of banning a checking box for ex-criminal was banned because we thought that it gives more access to felons

  • Even if we ban the box, employers wouldn’t just change their behaviors. But then the unintended consequence is that employers statistically discriminate based on race

3 types of ban the box

  1. Public employer only
  2. Private employer with government contract
  3. All employers

Main identification strategy

  • If any county in the Metropolitan Statistical Area (MSA) adopts ban the box, it means the whole MSA is treated. Or if the state adopts “ban the ban,” every county is treated

Under Simple Dif-n-dif

\[ Y_{it} = \beta_0 + \beta_1 Post_t + \beta_2 treat_i + \beta_2 (Post_t \times Treat_i) + \epsilon_{it} \]

But if there is no common post time, then we should use Staggered Dif-n-dif

\[ \begin{aligned} E_{imrt} &= \alpha + \beta_1 BTB_{imt} W_{imt} + \beta_2 BTB_{mt} + \beta_3 BTB_{mt} H_{imt}\\ &+ \delta_m + D_{imt} \beta_5 + \lambda_{rt} + \delta_m\times f(t) \beta_7 + e_{imrt} \end{aligned} \]

where

  • \(i\) = person; \(m\) = MSA; \(r\) = region (US regions e.g., Midwest) ; \(r\) = region; \(t\) = year

  • \(W\) = White; \(B\) = Black; \(H\) = Hispanic

  • \(\beta_1 BTB_{imt} W_{imt} + \beta_2 BTB_{mt} + \beta_3 BTB_{mt} H_{imt}\) are the 3 dif-n-dif variables (\(BTB\) = “ban the box”)

  • \(\delta_m\) = dummy for MSI

  • \(D_{imt}\) = control for people

  • \(\lambda_{rt}\) = region by time fixed effect

  • \(\delta_m \times f(t)\) = linear time trend within MSA (but we should not need this if we have good pre-trend)

If we put \(\lambda_r - \lambda_t\) (separately) we will more broad fixed effect, while \(\lambda_{rt}\) will give us deeper and narrower fixed effect.

Before running this model, we have to drop all other races. And \(\beta_1, \beta_2, \beta_3\) are not collinear because there are all interaction terms with \(BTB_{mt}\)

If we just want to estimate the model for black men, we will modify it to be

\[ E_{imrt} = \alpha + BTB_{mt} \beta_1 + \delta_m + D_{imt} \beta_5 + \lambda_{rt} + (\delta_m \times f(t)) \beta_7 + e_{imrt} \]

\[ \begin{aligned} E_{imrt} &= \alpha + BTB_{m (t - 3t)} \theta_1 + BTB_{m(t-2)} \theta_2 + BTB_{mt} \theta_4 \\ &+ BTB_{m(t+1)}\theta_5 + BTB_{m(t+2)}\theta_6 + BTB_{m(t+3t)}\theta_7 \\ &+ [\delta_m + D_{imt}\beta_5 + \lambda_r + (\delta_m \times (f(t))\beta_7 + e_{imrt}] \end{aligned} \]

We have to leave \(BTB_{m(t-1)}\theta_3\) out for the category would not be perfect collinearity

So the year before BTB (\(\theta_1, \theta_2, \theta_3\)) should be similar to each other (i.e., same pre-trend). Remember, we only run for places with BTB.

If \(\theta_2\) is statistically different from \(\theta_3\) (baseline), then there could be a problem, but it could also make sense if we have pre-trend announcement.

26.5.2 Example from Princeton

library(foreign)
mydata = read.dta("http://dss.princeton.edu/training/Panel101.dta") %>%
    # create a dummy variable to indicate the time when the treatment started
    dplyr::mutate(time = ifelse(year >= 1994, 1, 0)) %>%
    # create a dummy variable to identify the treatment group
    dplyr::mutate(treated = ifelse(country == "E" |
                                country == "F" | country == "G" ,
                            1,
                            0)) %>%
    # create an interaction between time and treated
    dplyr::mutate(did = time * treated)

estimate the DID estimator

didreg = lm(y ~ treated + time + did, data = mydata)
summary(didreg)
#> 
#> Call:
#> lm(formula = y ~ treated + time + did, data = mydata)
#> 
#> Residuals:
#>        Min         1Q     Median         3Q        Max 
#> -9.768e+09 -1.623e+09  1.167e+08  1.393e+09  6.807e+09 
#> 
#> Coefficients:
#>               Estimate Std. Error t value Pr(>|t|)  
#> (Intercept)  3.581e+08  7.382e+08   0.485   0.6292  
#> treated      1.776e+09  1.128e+09   1.575   0.1200  
#> time         2.289e+09  9.530e+08   2.402   0.0191 *
#> did         -2.520e+09  1.456e+09  -1.731   0.0882 .
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#> 
#> Residual standard error: 2.953e+09 on 66 degrees of freedom
#> Multiple R-squared:  0.08273,    Adjusted R-squared:  0.04104 
#> F-statistic: 1.984 on 3 and 66 DF,  p-value: 0.1249

The did coefficient is the differences-in-differences estimator. Treat has a negative effect

26.5.3 Example by Card and Krueger (1993)

found that increase in minimum wage increases employment

Experimental Setting:

  • New Jersey (treatment) increased minimum wage

  • Penn (control) did not increase minimum wage

After Before
Treatment NJ A B A - B
Control PA C D C - D
A - C B - D (A - B) - (C - D)

where

  • A - B = treatment effect + effect of time (additive)

  • C - D = effect of time

  • (A - B) - (C - D) = dif-n-dif

The identifying assumptions:

  • Can’t have switchers

  • PA is the control group

    • is a good counter factual

    • is what NJ would look like if they hadn’t had the treatment

\[ Y_{jt} = \beta_0 + NJ_j \beta_1 + POST_t \beta_2 + (NJ_j \times POST_t)\beta_3+ X_{jt}\beta_4 + \epsilon_{jt} \]

where

  • \(j\) = restaurant

  • \(NJ\) = dummy where \(1 = NJ\), and \(0 = PA\)

  • \(POST\) = dummy where \(1 = post\), and \(0 = pre\)

Notes:

  • We don’t need \(\beta_4\) in our model to have unbiased \(\beta_3\), but including it would give our coefficients efficiency

  • If we use \(\Delta Y_{jt}\) as the dependent variable, we don’t need \(POST_t \beta_2\) anymore

  • Alternative model specification is that the authors use NJ high wage restaurant as control group (still choose those that are close to the border)

  • The reason why they can’t control for everything (PA + NJ high wage) is because it’s hard to interpret the causal treatment

  • Dif-n-dif utilizes similarity in pretrend of the dependent variables. However, this is neither a necessary nor sufficient for the identifying assumption.

    • It’s not sufficient because they can have multiple treatments (technically, you could include more control, but your treatment can’t interact)

    • It’s not necessary because trends can be parallel after treatment

  • However, we can’t never be certain; we just try to find evidence consistent with our theory so that dif-n-dif can work.

  • Notice that we don’t need before treatment the levels of the dependent variable to be the same (e.g., same wage average in both NJ and PA), dif-n-dif only needs pre-trend (i.e., slope) to be the same for the two groups.

26.5.4 Example by Butcher, McEwan, and Weerapana (2014)

Theory:

  • Highest achieving students are usually in hard science. Why?

    • Hard to give students students the benefit of doubt for hard science

    • How unpleasant and how easy to get a job. Degrees with lower market value typically want to make you feel more pleasant

Under OLS

\[ E_{ij} = \beta_0 + X_i \beta_1 + G_j \beta_2 + \epsilon_{ij} \]

where

  • \(X_i\) = student attributes

  • \(\beta_2\) = causal estimate (from grade change)

  • \(E_{ij}\) = Did you choose to enroll in major \(j\)

  • \(G_j\) = grade given in major \(j\)

Examine \(\hat{\beta}_2\)

  • Negative bias: Endogenous response because department with lower enrollment rate will give better grade

  • Positive bias: hard science is already having best students (i.e., ability), so if they don’t their grades can be even lower

Under dif-n-dif

\[ Y_{idt} = \beta_0 + POST_t \beta_1 + Treat_d \beta_2 + (POST_t \times Treat_d)\beta_3 + X_{idt} + \epsilon_{idt} \]

where

  • \(Y_{idt}\) = grade average
Intercept Treat Post Treat*Post
Treat Pre 1 1 0 0
Treat Post 1 1 1 1
Control Pre 1 0 0 0
Control Post 1 0 1 0
Average for pre-control \(\beta_0\)

A more general specification of the dif-n-dif is that

\[ Y_{idt} = \alpha_0 + (POST_t \times Treat_d) \alpha_1 + \theta_d + \delta_t + X_{idt} + u_{idt} \]

where

  • \((\theta_d + \delta_t)\) richer , more df than \(Treat_d \beta_2 + Post_t \beta_1\) (because fixed effects subsume Post and treat)

  • \(\alpha_1\) should be equivalent to \(\beta_3\) (if your model assumptions are correct)

References

Butcher, Kristin F, Patrick J McEwan, and Akila Weerapana. 2014. “The Effects of an Anti-Grade Inflation Policy at Wellesley College.” Journal of Economic Perspectives 28 (3): 189–204.
Card, David, and Alan B Krueger. 1993. “Minimum Wages and Employment: A Case Study of the Fast Food Industry in New Jersey and Pennsylvania.” National Bureau of Economic Research Cambridge, Mass., USA.
Doleac, Jennifer L, and Benjamin Hansen. 2020. “The Unintended Consequences of ‘Ban the Box’: Statistical Discrimination and Employment Outcomes When Criminal Histories Are Hidden.” Journal of Labor Economics 38 (2): 321–74.