21.3 Experimental vs. Quasi-Experimental Designs

To determine whether a particular intervention or treatment causes an observed outcome, it requires more than observing associations—it demands a framework for causal inference.

To address this, researchers rely on two broad classes of research designs: experimental and quasi-experimental. Both aim to estimate causal effects, but they differ significantly in their level of control over the assignment mechanism and the assumptions required for valid inference.

Experimental Design, particularly randomized controlled trials (RCTs), are considered the gold standard for causal inference. By randomly assigning units (e.g., customers, users, regions) to treatment or control groups, these designs eliminate confounding and allow for straightforward interpretation of treatment effects. However, in many business settings, true randomization is costly, impractical, or ethically constrained.
Quasi-experimental designs provide an alternative when random assignment is not feasible. These designs rely on observational data and statistical techniques to approximate the conditions of an experiment. While more flexible in application, they typically require stronger assumptions and careful methodological implementation to yield credible causal insights.

The table below summarizes the key differences between these two approaches, highlighting their respective strengths, limitations, and use cases in applied research:

Experimental Design vs. Quasi-Experimental Design
Feature	Experimental Design	Quasi-Experimental Design
Assignment to Treatment	Randomized	Non-randomized (observational)
Control over Confounding	High	Limited (requires statistical control)
Causal Inference Validity	Strong (if properly implemented)	Weaker (depends on assumptions)
Feasibility in Field Studies	Often difficult or costly	More flexible and practical
Examples	A/B testing, clinical trials	Difference-in-differences, matching
Principal Investigator	Conducted by an experimentalist	Conducted by an observationalist
Type of Data	Uses experimental data	Uses observational data
Randomness helps	Random assignment reduces treatment imbalance	Random sampling reduces sample selection error

21.3.1 Criticisms of Quasi-Experimental Designs

Quasi-experimental methods do not always approximate experimental results accurately. For instance, LaLonde (1986) demonstrates that commonly used methods such as:

Matching Methods
Difference-in-differences
[Tobit-2] (Heckman-type models)

often fail to replicate experimental estimates reliably. This finding cast serious doubt on the credibility of observational studies for estimating causal effects, igniting an ongoing debate in econometrics and statistics about the reliability of nonexperimental evaluations.

LaLonde’s critical assessment served as a catalyst for significant methodological and practical advancements in causal inference. In the decades since this publication, the field has evolved considerably, introducing both theoretical innovations and empirical practices aimed at addressing the limitations that were exposed (G. Imbens and Xu 2024). Among these advances are:

Emphasis on estimators based on unconfoundedness (selection on observables): Modern causal inference frameworks frequently adopt the unconfoundedness or conditional independence assumption. Under this premise, treatment assignment is assumed to be independent of potential outcomes, conditional on observed covariates. This theoretical foundation underpins many widely used estimation techniques, such as matching methods, inverse probability weighting, and regression adjustment.
Focus on covariate overlap (common support): Researchers now recognize the critical importance of overlap, also referred to as common support, in the distributions of covariates across treatment and control groups. Without sufficient overlap, comparisons between treated and untreated units rely on extrapolation, which weakens causal claims. Modern methods explicitly assess and often impose restrictions to ensure overlap before proceeding with estimation.
Introduction of propensity score-based methods and doubly robust estimators: The introduction of propensity score methods (Rosenbaum and Rubin 1983) was a breakthrough, offering a way to reduce the dimensionality of the covariate space while balancing observed characteristics across groups. More recently, doubly robust estimators have emerged, combining propensity score weighting with outcome regression. These estimators provide consistent treatment effect estimates if either the propensity score model or the outcome model is correctly specified, offering greater robustness in practice.
Greater emphasis on validation exercises to bolster credibility: Modern studies increasingly incorporate validation techniques to evaluate the credibility of their findings. Placebo tests, falsification exercises, and sensitivity analyses are commonly employed to assess whether estimated effects may be driven by unobserved confounding or model misspecification. Such practices go beyond traditional goodness-of-fit statistics, directly interrogating the assumptions underlying causal inference.
Methods for estimating and exploiting treatment effect heterogeneity: Beyond estimating average treatment effects, contemporary research frequently explores heterogeneous treatment effects. These methods identify subgroups that may experience different causal impacts, which is of particular relevance in fields like personalized marketing, targeted interventions, and policy design.

To illustrate the practical lessons from these methodological advances, G. Imbens and Xu (2024) reexamine two canonical datasets:

LaLonde’s National Supported Work Demonstration data
The Imbens-Rubin-Sacerdote draft lottery data

Applying modern causal inference methods to these datasets demonstrates that, when sufficient covariate overlap exists, robust estimates of the adjusted differences between treatment and control groups can be achieved. However, it is critical to underscore that robustness in estimation does not equate to validity. Without direct validation exercises, such as placebo tests, even well-behaved estimates may be misleading.

G. Imbens and Xu (2024) also highlight several key lessons for practitioners working with nonexperimental data to estimate causal effects:

Careful examination of the assignment process is essential.
Understanding the mechanisms by which units are assigned to treatment or control conditions informs the plausibility of the unconfoundedness assumption.
Inspection of covariate overlap is non-negotiable.
Without sufficient overlap, causal effect estimation may rely heavily on model extrapolation, undermining credibility.
Validation exercises are indispensable.
Placebo tests and falsification strategies help ensure that estimated treatment effects are not artifacts of modeling choices or unobserved confounding.

While methodological advances have substantially improved the tools available for causal inference with observational data, their effective application requires rigorous attention to the underlying assumptions and diligent validation to support credible causal claims.

References

Imbens, Guido, and Yiqing Xu. 2024. “Lalonde (1986) After Nearly Four Decades: Lessons Learned.” arXiv Preprint arXiv:2406.00827.

LaLonde, Robert J. 1986. “Evaluating the Econometric Evaluations of Training Programs with Experimental Data.” The American Economic Review, 604–20.

Rosenbaum, Paul R, and Donald B Rubin. 1983. “The Central Role of the Propensity Score in Observational Studies for Causal Effects.” Biometrika 70 (1): 41–55.