30.11 Assumptions

Parallel Trends Assumption

The Difference-in-Differences estimator relies on a key identifying assumption: the parallel trends assumption. This assumption states that, in the absence of treatment, the average outcome for the treated group would have evolved over time in the same way as for the control group.

Let:

$Y_{it}(0)$ denote the potential outcome without treatment for unit $i$ at time $t$
$D_i = 1$ if unit $i$ is in the treatment group, and $D_i = 0$ if in the control group

Then, the parallel trends assumption can be written as:

$E[Y_{it}(0) \mid D_i = 1] - E[Y_{it}(0) \mid D_i = 0] = \Delta_0 \quad \text{for all } t,$

where $\Delta_0$ is a constant difference over time in the untreated potential outcomes between the two groups. This assumption does not require the levels of outcomes to be the same between groups—only that the difference remains constant over time.

In other words, the gap between treatment and control groups in the absence of treatment must remain stable. If this holds, any deviation from that stable difference after the treatment is attributed to the causal effect of the treatment.

It is important to understand how the parallel trends assumption compares to other, stronger assumptions:

Same untreated levels across groups:
A stronger assumption would require that the treated and control groups have identical untreated outcomes at all times:

$E[Y_{it}(0) \mid D_i = 1] = E[Y_{it}(0) \mid D_i = 0] \quad \text{for all } t$

This is often unrealistic in observational settings, where baseline characteristics typically differ between groups.
No change in untreated outcomes over time:
Another strong assumption is that untreated outcomes remain constant over time, for both groups:

$E[Y_{it}(0)] = E[Y_{i,t'}(0)] \quad \text{for all } i \text{ and times } t, t'$

This implies no secular trends, which is rarely plausible in real-world applications where outcomes (e.g., sales, earnings, health metrics) naturally evolve over time.

The parallel trends assumption is weaker than both of these and is generally more defensible, especially when supported by pre-treatment data.

DiD is appropriate when:

You have pre-treatment and post-treatment outcome data
You have clearly defined treatment and control groups
The parallel trends assumption is plausible

Avoid using DiD when:

Treatment assignment is not random or quasi-random
Unobserved confounders may cause the groups to evolve differently over time

Testing Parallel Trends: Prior Parallel Trends Test.

No Anticipation Effect (Pre-Treatment Exogeneity)

Individuals or groups should not change their behavior before the treatment is implemented in expectation of the treatment.
If units anticipate the treatment and adjust their behavior beforehand, it can introduce bias in the estimates.

Exogenous Treatment Assignment

Treatment should not be assigned based on potential outcomes.
Ideally, assignment should be as good as random, conditional on observables.

Stable Composition of Groups (No Attrition or Spillover)

Treatment and control groups should remain stable over time.
There should be no selective attrition (where individuals enter/leave due to treatment).
No spillover effects: Control units should not be indirectly affected by treatment.

No Simultaneous Confounding Events (Exogeneity of Shocks)

There should be no other major shocks that affect treatment/control groups differently at the same time as treatment implementation.

Limitations and Common Issues

Functional Form Dependence

If the response to treatment is nonlinear, compare high- vs. low-intensity groups.

Selection on (Time-Varying) Unobservables

Use Rosenbaum Bounds to check the sensitivity of estimates to unobserved confounders.

Long-Term Effects

Parallel trends are more reliable in short time windows.
Over long periods, other confounding factors may emerge.

Heterogeneous Effects

Treatment intensity (e.g., different doses) may vary across groups, leading to different effects.

Ashenfelter’s Dip (Ashenfelter 1978)

Participants in job training programs often experience earnings drops before enrolling, making them systematically different from nonparticipants.
Fix: Compute long-run differences, excluding periods around treatment, to test for sustained impact (Proserpio and Zervas 2017; J. J. Heckman, LaLonde, and Smith 1999; Jepsen, Troske, and Coomes 2014).

Lagged Treatment Effects

If effects are not immediate, using a lagged dependent variable $Y_{it-1}$ may be more appropriate (Blundell and Bond 1998).

Bias from Unobserved Factors Affecting Trends

If external shocks influence treatment and control groups differently, this biases DiD estimates.

Correlated Observations

Standard errors should be clustered appropriately.

Incidental Parameters Problem (Lancaster 2000)

Always prefer individual and time fixed effects to reduce bias.

Treatment Timing and Negative Weights

If treatment timing varies across units, negative weights can arise in standard DiD estimators when treatment effects are heterogeneous (Athey and Imbens 2022; Borusyak, Jaravel, and Spiess 2024; Goodman-Bacon 2021).
Fix: Use estimators from Callaway and Sant’Anna (2021) and Clément De Chaisemartin and d’Haultfoeuille (2020) (did package).
If expecting lags and leads, see L. Sun and Abraham (2021).

Treatment Effect Heterogeneity Across Groups

If treatment effects vary across groups and interact with treatment variance, standard estimators may be invalid (Gibbons, Suárez Serrato, and Urbancic 2018).

Endogenous Timing

If the timing of units can be influenced by strategic decisions in a DID analysis, an instrumental variable approach with a control function can be used to control for endogeneity in timing.

Questionable Counterfactuals

In situations where the control units may not serve as a reliable counterfactual for the treated units, matching methods such as propensity score matching or generalized random forest can be utilized. Additional methods can be found in Matching Methods.

30.11.1 Prior Parallel Trends Test

The parallel trends assumption ensures that, absent treatment, the treated and control groups would have followed similar outcome trajectories. Testing this assumption involves visualization and statistical analysis.

Marcus and Sant’Anna (2021) discuss pre-trend testing in staggered DiD.

Visual Inspection: Outcome Trends and Treatment Rollout

Plot raw outcome trends for both groups before and after treatment.
Use event-study plots to check for pre-trend violations and anticipation effects.
Visualization tools like ggplot2 or panelView help illustrate treatment timing and trends.

Event-Study Regressions

A formal test for pre-trends uses the event-study model:

$Y_{it} = \alpha + \sum_{k=-K}^{K} \beta_k 1(T = k) + X_{it} \gamma + \lambda_i + \delta_t + \epsilon_{it}$

where:

$1(T = k)$ are time dummies for periods before and after treatment.
$\beta_k$ captures deviations in outcomes before treatment; these should be statistically indistinguishable from zero if parallel trends hold.
$\lambda_i$ and $\delta_t$ are unit and time fixed effects.
$X_{it}$ are optional covariates.

Violation of parallel trends occurs if pre-treatment coefficients ( $\beta_k$ for $k < 0$ ) are statistically significant.

Statistical Test for Pre-Treatment Trend Differences

Using only pre-treatment data, estimate:

$Y = \alpha_g + \beta_1 T + \beta_2 (T \times G) + \epsilon$

where:

$\beta_2$ measures differences in time trends between groups.
If $\beta_2 = 0$ , trends are parallel before treatment.

Considerations:

Alternative functional forms (e.g., polynomials or nonlinear trends) can be tested.
If $\beta_2 \neq 0$ , potential explanations include:
- Large sample size driving statistical significance.
- Small deviations in one period disrupting an otherwise stable trend.

While time fixed effects can partially address violations of parallel trends (and are commonly used in modern research), they may also absorb part of the treatment effect, especially when treatment effects vary over time (Wolfers 2003).

Debate on Parallel Trends

Levels vs. Trends: Kahn-Lang and Lang (2020) argue that similarity in levels is also crucial. If treatment and control groups start at different levels, why assume their trends will be the same?
- Solution:
  - Plot time series for the treated and control groups.
  - Use matched samples to improve comparability (Ryan et al. 2019) (useful when parallel trends assumption is questionable).
- If levels differ significantly, functional form assumptions become more critical and must be justified.
Power of Pre-Trend Tests:
- Pre-trend tests often lack statistical power, making false negatives common (Roth 2022).
- See: PretrendsPower and pretrends (for adjustments).
Outcome Transformations Matter:
- The parallel trends assumption is specific to both the transformation and units of the outcome variable (Roth and Sant’Anna 2023).
- Conduct falsification tests to check whether the assumption holds under different functional forms.

library(tidyverse)
library(fixest)
od <- causaldata::organ_donations %>%
    # Use only pre-treatment data
    filter(Quarter_Num <= 3) %>% 
    # Treatment variable
    dplyr::mutate(California = State == 'California')

# use my package
causalverse::plot_par_trends(
    data = od,
    metrics_and_names = list("Rate" = "Rate"),
    treatment_status_var = "California",
    time_var = list(Quarter_Num = "Time"),
    display_CI = F
)
#> [[1]]


# do it manually
# always good but plot the dependent out
od |>
    # group by treatment status and time
    dplyr::group_by(California, Quarter) |>
    dplyr::summarize_all(mean) |>
    dplyr::ungroup() |>
    # view()
    
    ggplot2::ggplot(aes(x = Quarter_Num, y = Rate, color = California)) +
    ggplot2::geom_line() +
    causalverse::ama_theme()



# but it's also important to use statistical test
prior_trend <- fixest::feols(Rate ~ i(Quarter_Num, California) |
                                 State + Quarter,
                             data = od)

fixest::coefplot(prior_trend, grid = F)

fixest::iplot(prior_trend, grid = F)

This is alarming since one of the periods is significantly different from 0, which means that our parallel trends assumption is not plausible.

In cases where the parallel trends assumption is questionable, researchers should consider methods for assessing and addressing potential violations. Some key approaches are discussed in Rambachan and Roth (2023):

Imposing Restrictions: Constrain how different the post-treatment violations of parallel trends can be relative to pre-treatment deviations.
Partial Identification: Rather than assuming a single causal effect, derive bounds on the ATT.
Sensitivity Analysis: Evaluate how sensitive the results are to potential deviations from parallel trends.

To implement these approaches, the HonestDiD package by Rambachan and Roth (2023) provides robust statistical tools:

# https://github.com/asheshrambachan/HonestDiD
# remotes::install_github("asheshrambachan/HonestDiD")
# library(HonestDiD)

Alternatively, Ban and Kedagni (2022) propose a method that incorporates pre-treatment covariates as an information set and makes an assumption about the selection bias in the post-treatment period. Specifically, they assume that the selection bias lies within the convex hull of all pre-treatment selection biases. Under this assumption:

They identify a set of possible ATT values.
With a stronger assumption on selection bias—grounded in policymakers’ perspectives—they can estimate a point estimate of ATT.

Another useful tool for assessing parallel trends is the pretrends package by Roth (2022), which provides formal pre-trend tests:

# Install and load the pretrends package
# install.packages("pretrends")
# library(pretrends)

30.11.2 Placebo Test

A placebo test is a diagnostic tool used in Difference-in-Differences analysis to assess whether the estimated treatment effect is driven by pre-existing trends rather than the treatment itself. The idea is to estimate a treatment effect in a scenario where no actual treatment occurred. If a significant effect is found, it suggests that the parallel trends assumption may not hold, casting doubt on the validity of the causal inference.

Types of Placebo DiD Tests

Group-Based Placebo Test

Assign treatment to a group that was never actually treated and rerun the DiD model.
If the estimated treatment effect is statistically significant, this suggests that differences between groups—not the treatment—are driving results.
This test helps rule out the possibility that the estimated effect is an artifact of unobserved systematic differences.

A valid treatment effect should be consistent across different reasonable control groups. To assess this:

Rerun the DiD model using an alternative but comparable control group.
Compare the estimated treatment effects across multiple control groups.
If results vary significantly, this suggests that the choice of control group may be influencing the estimated effect, indicating potential selection bias or unobserved confounding.

Time-Based Placebo Test

Conduct DiD using only pre-treatment data, pretending that treatment occurred at an earlier period.
A significant estimated treatment effect implies that differences in pre-existing trends—not treatment—are responsible for observed post-treatment effects.
This test is particularly useful when concerns exist about unobserved shocks or anticipatory effects.

Random Reassignment of Treatment

Keep the same treatment and control periods but randomly assign treatment to units that were not actually treated.
If a significant DiD effect still emerges, it suggests the presence of biases, unobserved confounding, or systematic differences between groups that violate the parallel trends assumption.

Procedure for a Placebo Test

Using Pre-Treatment Data Only

A robust placebo test often involves analyzing only pre-treatment periods to check whether spurious treatment effects appear. The procedure includes:

Restricting the sample to pre-treatment periods only.
Assigning a fake treatment period before the actual intervention.
Testing a sequence of placebo cutoffs over time to examine whether different assumed treatment timings yield significant effects.
Generating random treatment periods and using randomization inference to assess the sampling distribution of the placebo effect.
Estimating the DiD model using the fake post-treatment period (post_time = 1).
Interpretation: If the estimated treatment effect is statistically significant, this indicates that pre-existing trends (not treatment) might be influencing results, violating the parallel trends assumption.

Using Control Groups for a Placebo Test

If multiple control groups are available, a placebo test can also be conducted by:

Dropping the actual treated group from the analysis.
Assigning one of the control groups as a fake treated group.
Estimating the DiD model and checking whether a significant effect is detected.
Interpretation:
- If a placebo effect appears (i.e., the estimated treatment effect is significant), it suggests that even among control groups, systematic differences exist over time.
- However, this result is not necessarily disqualifying. Some methods, such as Synthetic Control, explicitly model such differences while maintaining credibility.

# Load necessary libraries
library(tidyverse)
library(fixest)
library(ggplot2)
library(causaldata)

# Load the dataset
od <- causaldata::organ_donations %>%
    # Use only pre-treatment data
    dplyr::filter(Quarter_Num <= 3) %>%
    
    # Create fake (placebo) treatment variables
    dplyr::mutate(
        FakeTreat1 = as.integer(State == 'California' &
                                    Quarter %in% c('Q12011', 'Q22011')),
        FakeTreat2 = as.integer(State == 'California' &
                                    Quarter == 'Q22011')
    )

# Estimate the placebo effects using fixed effects regression
clfe1 <- fixest::feols(Rate ~ FakeTreat1 | State + Quarter, data = od)
clfe2 <- fixest::feols(Rate ~ FakeTreat2 | State + Quarter, data = od)

# Display the regression results
fixest::etable(clfe1, clfe2)
#>                           clfe1            clfe2
#> Dependent Var.:            Rate             Rate
#>                                                 
#> FakeTreat1      0.0061 (0.0051)                 
#> FakeTreat2                      -0.0017 (0.0028)
#> Fixed-Effects:  --------------- ----------------
#> State                       Yes              Yes
#> Quarter                     Yes              Yes
#> _______________ _______________ ________________
#> S.E.: Clustered       by: State        by: State
#> Observations                 81               81
#> R2                      0.99377          0.99376
#> Within R2               0.00192          0.00015
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# Extract coefficients and confidence intervals
coef_df <- tibble(
    Model = c("FakeTreat1", "FakeTreat2"),
    Estimate = c(coef(clfe1)["FakeTreat1"], coef(clfe2)["FakeTreat2"]),
    SE = c(summary(clfe1)$coeftable["FakeTreat1", "Std. Error"], 
           summary(clfe2)$coeftable["FakeTreat2", "Std. Error"]),
    Lower = Estimate - 1.96 * SE,
    Upper = Estimate + 1.96 * SE
)

# Plot the placebo effects
ggplot(coef_df, aes(x = Model, y = Estimate)) +
    geom_point(size = 3, color = "blue") +
    geom_errorbar(aes(ymin = Lower, ymax = Upper), width = 0.2, color = "blue") +
    geom_hline(yintercept = 0, linetype = "dashed", color = "red") +
    theme_minimal() +
    labs(
        title = "Placebo Treatment Effects",
        y = "Estimated Effect on Organ Donation Rate",
        x = "Placebo Treatment"
    )

We would like the “supposed” DiD to be insignificant.

References

Ashenfelter, Orley. 1978. “Estimating the Effect of Training Programs on Earnings.” The Review of Economics and Statistics, 47–57.

———. 2022. “Design-Based Analysis in Difference-in-Differences Settings with Staggered Adoption.” Journal of Econometrics 226 (1): 62–79.

Ban, Kyunghoon, and Desire Kedagni. 2022. “Generalized Difference-in-Differences Models: Robust Bounds.” arXiv Preprint arXiv:2211.06710.

Blundell, Richard, and Stephen Bond. 1998. “Initial Conditions and Moment Restrictions in Dynamic Panel Data Models.” Journal of Econometrics 87 (1): 115–43.

———. 2024. “Revisiting Event-Study Designs: Robust and Efficient Estimation.” Review of Economic Studies 91 (6): 3253–85.

Callaway, Brantly, and Pedro HC Sant’Anna. 2021. “Difference-in-Differences with Multiple Time Periods.” Journal of Econometrics 225 (2): 200–230.

De Chaisemartin, Clément, and Xavier d’Haultfoeuille. 2020. “Two-Way Fixed Effects Estimators with Heterogeneous Treatment Effects.” American Economic Review 110 (9): 2964–96.

Gibbons, Charles E, Juan Carlos Suárez Serrato, and Michael B Urbancic. 2018. “Broken or Fixed Effects?” Journal of Econometric Methods 8 (1): 20170002.

Goodman-Bacon, Andrew. 2021. “Difference-in-Differences with Variation in Treatment Timing.” Journal of Econometrics 225 (2): 254–77.

Heckman, James J, Robert J LaLonde, and Jeffrey A Smith. 1999. “The Economics and Econometrics of Active Labor Market Programs.” In Handbook of Labor Economics, 3:1865–2097. Elsevier.

Jepsen, Christopher, Kenneth Troske, and Paul Coomes. 2014. “The Labor-Market Returns to Community College Degrees, Diplomas, and Certificates.” Journal of Labor Economics 32 (1): 95–121.

Kahn-Lang, Ariella, and Kevin Lang. 2020. “The Promise and Pitfalls of Differences-in-Differences: Reflections on 16 and Pregnant and Other Applications.” Journal of Business & Economic Statistics 38 (3): 613–20.

Lancaster, Tony. 2000. “The Incidental Parameter Problem Since 1948.” Journal of Econometrics 95 (2): 391–413.

Marcus, Michelle, and Pedro HC Sant’Anna. 2021. “The Role of Parallel Trends in Event Study Settings: An Application to Environmental Economics.” Journal of the Association of Environmental and Resource Economists 8 (2): 235–75.

Proserpio, Davide, and Georgios Zervas. 2017. “Online Reputation Management: Estimating the Impact of Management Responses on Consumer Reviews.” Marketing Science 36 (5): 645–65.

Rambachan, Ashesh, and Jonathan Roth. 2023. “A More Credible Approach to Parallel Trends.” Review of Economic Studies, rdad018.

Roth, Jonathan. 2022. “Pretest with Caution: Event-Study Estimates After Testing for Parallel Trends.” American Economic Review 4 (3): 305–22.

Roth, Jonathan, and Pedro HC Sant’Anna. 2023. “When Is Parallel Trends Sensitive to Functional Form?” Econometrica 91 (2): 737–47.

Ryan, Andrew M, Evangelos Kontopantelis, Ariel Linden, and James F Burgess Jr. 2019. “Now Trending: Coping with Non-Parallel Trends in Difference-in-Differences Analysis.” Statistical Methods in Medical Research 28 (12): 3697–3711.

Sun, Liyang, and Sarah Abraham. 2021. “Estimating Dynamic Treatment Effects in Event Studies with Heterogeneous Treatment Effects.” Journal of Econometrics 225 (2): 175–99.

Wolfers, Justin. 2003. “Is Business Cycle Volatility Costly? Evidence from Surveys of Subjective Well-Being.” International Finance 6 (1): 1–26.