27 Regression Discontinuity
Regression Discontinuity (RD) is a quasi-experimental design that exploits a discrete jump in treatment assignment based on a continuous (or approximately continuous) variable, often called the running variable, forcing variable, or assignment variable.
- A common application occurs when an intervention is assigned based on whether the running variable exceeds a predefined cutoff.
- The approach provides localized causal inference around the cutoff, akin to a randomized experiment.
- Historical Background: First introduced by (Thistlethwaite and Campbell 1960) in the context of merit awards and their effect on future academic outcomes.
Key References:
Foundational papers: (G. Imbens and Lemieux 2008; Lee and Lemieux 2010)
-
Practical guidelines:
27.1 Conceptual Framework
Regression discontinuity is best understood as a localized experiment at the threshold:
Internal Validity: Strong within a narrow bandwidth around the cutoff.
External Validity: Limited—results may not generalize beyond the bandwidth.
Comparison to Randomized Experiments: Empirical evidence suggests that RD and randomized controlled trials yield similar estimates ((Chaplin et al. 2018), Mathematica).
RD is connected to other causal inference methods:
Randomized Experiment: It’s local randomization.
Instrumental Variables: RD can be viewed as a structural IV model (J. D. Angrist and Lavy 1999).
Matching Methods: RD is a special case where matching occurs at a single threshold (J. J. Heckman, LaLonde, and Smith 1999).
Interrupted Time Series (ITS): RD is preferable when the running variable is finely measured. However, with highly discrete time data (e.g., quarterly or annual), ITS may be more appropriate. Hence, RD is always better than ITS if data are infinite (or substantially large).
27.1.1 Types of Regression Discontinuity Designs
- Sharp RD: Treatment probability jumps from 0 to 1 at the cutoff.
- Fuzzy RD: Treatment probability changes discontinuously but does not reach 1.
- Kink RD: Discontinuity occurs in the slope rather than the level of the running variable (Nielsen, Sørensen, and Taber 2010) (see applications in Böckerman, Kanninen, and Suoniemi (2018) and theoretical foundations in Card et al. (2015)).
- Regression Discontinuity in Time (i.e., Interrupted Time Series): The running variable is time.
Additional variations:
Multiple Cutoffs: Different thresholds across subgroups.
Multiple Scores: More than one running variable.
Geographic RD: Cutoff is spatially defined.
Dynamic Treatments: Treatment effects evolve over time.
Continuous Treatments: Instead of binary treatment, intensity varies.
27.1.2 Assumptions for RD Validity
Independent Assignment: The treatment is assigned solely based on the running variable.
-
Continuity of Conditional Expectations: The expected outcomes without treatment are continuous at the cutoff:
E[Y(0)|X=x] and E[Y(1)|X=x] are continuous at x=c.
Exogeneity of the Cutoff: The cutoff should not be manipulable.
No Discontinuity in Confounding Variables: Other covariates should be smooth at the threshold. A common test is to check for jumps in covariates unrelated to treatment.
27.1.4 Violation of Continuity in Covariates
If other variables besides treatment exhibit a discontinuity at the cutoff, the estimated effect may be biased.
Solution: Conduct balance tests on pre-treatment covariates.
27.2 Model Estimation Strategies
Under regression discontinuity framework, researchers can choose between parametric and nonparametric models. The choice depends on assumptions about functional form, data availability, and the trade-off between flexibility and interpretability.
27.2.1 Parametric Models: Polynomial Regression
Parametric models, such as linear regression, assume a specific functional form for the relationship between the dependent variable and predictors. One way to relax the strict linearity assumption is by incorporating polynomial functions of the forcing variable (Lee and Lemieux 2010). The choice of polynomial degree should be determined based on data characteristics.
However, using high-order polynomials comes with challenges. Gelman and Imbens (2019) highlight three key issues associated with global high-degree polynomials:
- Imprecise Estimates Due to Noise: Higher-degree polynomials can overfit, capturing noise instead of meaningful patterns.
- Sensitivity to Polynomial Degree: Estimates can vary significantly depending on the chosen degree, making the model less stable.
- Inadequate Confidence Interval Coverage: Confidence intervals tend to be misleading when using high-degree polynomials, leading to incorrect inference.
For these reasons, researchers are often advised to avoid global high-order polynomials and instead rely on alternative approaches (i.e., nonparametric version).
27.2.2 Nonparametric Models: Local Regression
Nonparametric models, such as local polynomial regression, offer greater flexibility by avoiding strong assumptions about functional form. Instead of fitting a single equation to the entire dataset, these methods estimate relationships locally—often using linear or quadratic polynomials within a neighborhood of each data point.
- Uses weighted observations near the cutoff.
- More flexible than global polynomial regression.
Local regression methods, such as local linear regression, address the shortcomings of high-order global polynomials by:
- Reducing overfitting, as they adapt to local patterns without excessive complexity.
- Providing stable estimates, since results are less sensitive to arbitrary choices of polynomial degree.
- Improving inference, ensuring more reliable confidence intervals.
By balancing flexibility and robustness, local regression techniques are often preferred over global polynomial models in applied research.
Best Practice: Use multiple model specifications to check for consistency in results.
27.3 Formal Definition
Let Xi be the running variable, c the cutoff, and Di the treatment indicator:
Di=1Xi>c
or equivalently,
Di={1,Xi>c0,Xi<c
where:
Di: Treatment assignment
Xi: Running variable (continuous)
c: Cutoff value
27.3.1 Identification Assumptions
27.3.1.1 Continuity-Based Identification
RD estimates the [Local Average Treatment Effect] at the cutoff:
αSRDD=E[Y1i−Y0i|Xi=c]=E[Y1i|Xi=c]−E[Y0i|Xi=c]=lim
This relies on the assumption that, in the absence of treatment, the conditional expectation of potential outcomes is continuous at the threshold c.
27.3.1.2 Local Randomization-Based Identification
Alternatively, identification can be achieved using local randomization within a small bandwidth W (i.e., a neighborhood around the cutoff). The [Local Average Treatment Effect] in this case is:
\begin{aligned} \alpha_{LR} &= E[Y_{1i} - Y_{0i}|X_i \in W] \\ &= \frac{1}{N_1} \sum_{X_i \in W, D_i = 1} Y_i - \frac{1}{N_0} \sum_{X_i \in W, D_i = 0} Y_i \end{aligned}
Since RD estimates are local, they may not generalize to the entire population. However, for many applications, internal validity is of primary concern (rather than external validity).
27.4 Estimation and Inference
27.4.1 Local Randomization-Based Approach
The local randomization approach additionally assumes that within the chosen window W = [c - w, c + w], treatment assignment is as good as random. This requires:
- The joint probability distribution of running variable values inside W to be known.
- Potential outcomes to be independent of the running variable within W.
This is a stronger assumption than continuity-based identification, as it requires that regression functions are smooth at c and remain unaffected by X_i within W.
Since researchers can choose the window W (where random assignment plausibly holds), the sample size can often be small.
The selection of W can be based on:
- Pre-treatment covariate balance: Ensure covariates are similar across the threshold.
- Independent tests: Check for independence between the outcome and the running variable.
- Domain knowledge: Use theoretical or empirical justification for the window choice.
For inference, researchers can use:
- (Fisher) randomization inference
- (Neyman) design-based methods
27.4.2 Continuity-Based Approach
Also known as the local polynomial regression method, this approach estimates treatment effects by fitting a polynomial model locally around the cutoff. Global polynomial regression is not recommended due to issues such as:
- Lack of robustness
- Overfitting
- Runge’s phenomenon (oscillatory behavior at boundaries)
Steps for Local Polynomial Estimation
- Choose the polynomial order and weighting scheme
- Select an optimal bandwidth (minimizing MSE or coverage error)
- Estimate the parameter of interest
- Perform robust bias-corrected inference
This method ensures that estimation remains local, capturing the treatment effect precisely at the cutoff.
27.5 Specification Checks
To validate the credibility of an RD design, researchers perform several specification checks:
- Balance Checks
- Sorting, Bunching, and Manipulation
- Placebo Tests
- Sensitivity to Bandwidth Choice
- Manipulation-Robust Regression Discontinuity Bounds
27.5.1 Balance Checks
Also known as checking for discontinuities in average covariates, this test examines whether covariates that should not be affected by treatment exhibit a discontinuity at the cutoff.
- Null Hypothesis (H_0): The average effect of covariates on pseudo-outcomes (i.e., those that should not be influenced by treatment) is zero.
- If rejected, this raises serious doubts about the RD design, necessitating a strong justification.
27.5.2 Sorting, Bunching, and Manipulation
This test, also known as checking for discontinuities in the distribution of the forcing variable, detects whether subjects manipulate the running variable to sort into or out of treatment.
If individuals can manipulate the running variable—especially when the cutoff is known in advance—this can lead to bunching behavior (i.e., clustering just above or below the cutoff).
- If treatment is desirable, individuals will try to sort into treatment, leading to a gap just below the cutoff.
- If treatment is undesirable, individuals will try to avoid it, leading to a gap just above the cutoff.
Under RD, we assume that there is no manipulation in the running variable. However, bunching behavior, where firms or individuals strategically manipulate their position, violates this assumption.
To address this issue, the bunching approach estimates the counterfactual distribution—what the density of individuals would have been in the absence of manipulation.
The fraction of individuals who engaged in manipulation is then calculated by comparing the observed distribution to this counterfactual distribution.
In a standard RD framework, this step is unnecessary because it assumes that the observed distribution and the counterfactual (manipulation-free) distribution are the same, implying no manipulation
27.5.2.1 McCrary Sorting Test
A widely used formal test is the McCrary density test (McCrary 2008), later refined by Cattaneo, Idrobo, and Titiunik (2019).
- Null Hypothesis (H_0): The density of the running variable is continuous at the cutoff.
- Alternative Hypothesis (H_a): A discontinuity (jump) in the density function at the cutoff, suggesting manipulation.
- Interpretation:
- A significant discontinuity suggests manipulation, violating RD assumptions.
- A failure to reject H_0 does not necessarily confirm validity, as some forms of manipulation may remain undetected.
- If there is two-sided manipulation, this test will fail to detect it.
27.5.2.2 Guidelines for Assessing Manipulation
- J. L. Zhang and Rubin (2003), Lee (2009), and Aronow, Baron, and Pinson (2019) provide criteria for evaluating manipulation risks.
- Knowing your research design inside out is crucial to anticipating possible manipulation attempts.
- Manipulation is often one-sided, meaning subjects shift only in one direction relative to the cutoff. In rare cases, two-sided manipulation may occur but often cancels out.
- We could also observe partial manipulation in reality (e.g., when subjects can only imperfectly manipulate). However, since we typically treat it like a fuzzy RD, we would not encounter identification problems. In contrast, complete manipulation would lead to serious identification issues.
Bunching Methodology
- Bunching occurs when individuals self-select into specific values of the running variable (e.g., policy thresholds). See Kleven (2016) for a review.
- The method helps estimate the counterfactual distribution—what the density would have been without manipulation.
- The fraction of individuals who manipulated can be estimated by comparing observed densities to the counterfactual.
If the running variable and outcome are simultaneously determined, a modified RD estimator can be used for consistent estimation (Bajari et al. 2011):
- One-sided manipulation: Individuals shift only in one direction relative to the cutoff (similar to the monotonicity assumption in instrumental variables).
- Bounded manipulation (regularity assumption): The density of individuals far from the threshold remains unaffected (Blomquist et al. 2021; Bertanha, McCallum, and Seegert 2021).
27.5.2.3 Steps for Bunching Analysis
- Identify the window where bunching occurs (based on Bosch, Dekker, and Strohmaier (2020)). Perform robustness checks by varying the manipulation window.
- Estimate the manipulation-free counterfactual distribution.
- Standard errors for inference can be calculated (following Chetty, Hendren, and Katz (2016)), where bootstrap resampling of residuals is used in estimating the counts of individuals within bins. However, this step may be unnecessary for large datasets.
If the bunching test fails to detect manipulation, we proceed to the Placebo Test.
McCrary Density Test (Discontinuity in Forcing Variable)
library(rdd)
set.seed(1)
# Simulated data without discontinuity
x <- runif(100, -1, 1)
DCdensity(x, 0) # No discontinuity

#> [1] 0.06355195
# Simulated data with discontinuity
x <- runif(1000, -1, 1)
x <- x + 2 * (runif(1000, -1, 1) > 0 & x < 0)
DCdensity(x, 0) # Discontinuity detected

#> [1] 0.001936782
Cattaneo Density Test (Improved Version)
library(rddensity)
# Simulated continuous density
set.seed(1)
x <- rnorm(100, mean = -0.5)
rdd <- rddensity(X = x, vce = "jackknife")
summary(rdd)
#>
#> Manipulation testing using local polynomial density estimation.
#>
#> Number of obs = 100
#> Model = unrestricted
#> Kernel = triangular
#> BW method = estimated
#> VCE method = jackknife
#>
#> c = 0 Left of c Right of c
#> Number of obs 66 34
#> Eff. Number of obs 38 25
#> Order est. (p) 2 2
#> Order bias (q) 3 3
#> BW est. (h) 0.922 0.713
#>
#> Method T P > |T|
#> Robust 0.6715 0.5019
#>
#>
#> P-values of binomial tests (H0: p=0.5).
#>
#> Window Length <c >=c P>|T|
#> 0.563 + 0.563 26 20 0.4614
#> 0.603 + 0.580 27 20 0.3817
#> 0.643 + 0.596 30 20 0.2026
#> 0.683 + 0.613 32 21 0.1690
#> 0.722 + 0.630 32 22 0.2203
#> 0.762 + 0.646 33 22 0.1770
#> 0.802 + 0.663 33 23 0.2288
#> 0.842 + 0.680 35 24 0.1925
#> 0.882 + 0.696 36 24 0.1550
#> 0.922 + 0.713 38 25 0.1299
# Plot requires customization (refer to package documentation)
27.5.3 Placebo Tests
Placebo tests, also known as falsification checks, assess whether discontinuities appear at points other than the treatment cutoff. This helps verify that observed effects are causal rather than artifacts of the method or data.
- There should be no jumps in the outcome at values other than the cutoff (X_i < c or X_i \geq c).
- The test involves shifting the cutoff along the running variable while using the same bandwidth to check for discontinuities in the conditional mean of the outcome.
- This approach is similar to balance checks in experimental design, ensuring no pre-existing differences. Remember, we can only test on observables, not unobservables.
Under a valid RD design, matching methods are unnecessary. Just as with randomized experiments, balance should naturally occur across the threshold. If adjustments are required, it suggests the RD assumptions may be invalid.
27.5.3.1 Applications of Placebo Tests
- Testing No Discontinuity in Predetermined Covariates: Covariates that should not be affected by treatment should not exhibit a jump at the cutoff.
- Testing Other Discontinuities: Checking for discontinuities at other arbitrary points along the running variable.
- Using Placebo Outcomes: If an outcome variable that should not be affected by treatment shows a significant discontinuity, this raises concerns about RD validity.
- Assessing Sensitivity to Covariates: RD estimates should not be highly sensitive to the inclusion or exclusion of covariates.
27.5.3.2 Mathematical Specification
The balance of observable characteristics on both sides of the threshold can be tested using:
Z_i = \alpha_0 + \alpha_1 f(X_i) + I(X_i \geq c) \alpha_2 + [f(X_i) \times I(X_i \geq c)]\alpha_3 + u_i
where:
X_i = running variable
Z_i = predetermined characteristics (e.g., age, education, etc.)
\alpha_2 should be zero if Z_i is unaffected by treatment.
If multiple covariates Z_i are tested simultaneously, simulating their joint distribution avoids false positives due to multiple comparisons. This step is unnecessary if covariates are independent, but such independence is unlikely in practice.
27.5.4 Sensitivity to Bandwidth Choice
The choice of bandwidth is crucial in RD estimation. Different bandwidth selection methods exist:
- Ad-hoc or Substantively Driven: Based on theoretical or empirical reasoning.
- Data-Driven Selection (Cross-Validation): Optimizes bandwidth to minimize prediction error.
- Conservative Approach: Uses robust optimal bandwidth selection methods (e.g., (Calonico, Cattaneo, and Farrell 2020)).
The objective is to minimize mean squared error (MSE) between estimated and actual treatment effects.
27.5.5 Assessing Sensitivity
- Results should be consistent across reasonable bandwidth choices.
- The optimal bandwidth for estimating treatment effects may differ from the optimal bandwidth for testing covariates but should be fairly close.
# Load required package
library(rdd)
# Simulate some data
set.seed(123)
n <- 100 # Sample size
# Running variable centered around 0
running_var <- runif(n, -1, 1)
# Treatment assigned at cutpoint 0
treatment <- ifelse(running_var >= 0, 1, 0)
# Outcome variable
outcome_var <- 2 * running_var + treatment * 1.5 + rnorm(n)
# Compute the optimal Imbens-Kalyanaraman bandwidth
bandwidth <-
IKbandwidth(running_var,
outcome_var,
cutpoint = 0,
kernel = "triangular")
# Print the bandwidth
print(bandwidth)
#> [1] 0.4374142
27.5.6 Manipulation-Robust Regression Discontinuity Bounds
Regression Discontinuity designs rely on the assumption that the running variable X_i is not manipulable by agents in the study. However, McCrary (2008) showed that a discontinuity in the density of X_i at the cutoff may indicate manipulation, potentially invalidating RD estimates. The common approach to handling detected manipulation is:
- If no manipulation is detected, proceed with RD analysis.
- If manipulation is detected, use the “doughnut-hole” method (i.e., excluding near-cutoff observations), but this contradicts the RD principles.
However, strict adherence to this rule can lead to two problems:
- False Negatives: A small sample size might fail to detect manipulation, leading to biased estimates if manipulation still affects the running variable.
- Loss of Informative Data: Even when manipulation is detected, the data may still contain valuable information for causal inference.
To address these challenges, Gerard, Rokkanen, and Rothe (2020) introduce a framework that accounts for manipulated observations rather than discarding them. This approach:
- Identifies the extent of manipulation.
- Computes worst-case bounds on treatment effects.
- Provides a systematic way to incorporate manipulated observations while maintaining the credibility of RD analysis.
If manipulation is believed to be unlikely, an alternative approach is to conduct sensitivity analysis by:
Testing how different hypothetical values of \tau affect the bounds.
Comparing results across various \tau assumptions to assess robustness.
Consider independent observations (X_i, Y_i, D_i):
- X_i: Running variable.
- Y_i: Outcome variable.
- D_i: Treatment indicator (D_i = 1 if X_i \geq c, and D_i = 0 otherwise).
A sharp RD design satisfies D_i = I(X_i \geq c), while a fuzzy RD design allows probabilistic treatment assignment.
The population consists of two types of units:
- Potentially-Assigned Units (M_i = 0): These units follow the standard RD framework. They have potential outcomes Y_i(d) and potential treatment states D_i(x).
- Always-Assigned Units (M_i = 1): These units always appear on one side of the cutoff and do not require potential outcomes.
If no always-assigned units exist (M_i = 1 for no units), the standard RD model holds.
27.5.6.1 Key Assumptions
-
Local Independence and Continuity:
- Treatment probability jumps at c among potentially-assigned units: P(D = 1|X = c^+, M = 0) > P(D = 1|X = c^-, M = 0).
- No defiers: P(D^+ \geq D^- | X = c, M = 0) = 1.
- Potential outcomes and treatment states are continuous at c.
- The density of the running variable among potentially-assigned units, F_{X|M=0}(x), is differentiable at c.
-
Smoothness of the Running Variable among Potentially-Assigned Units:
- The derivative of F_{X|M=0}(x) is continuous at c.
-
Restrictions on Always-Assigned Units:
- Always-assigned units satisfy P(X \geq c|M = 1) = 1.
- The density of the running variable among always-assigned units, F_{X|M=1}(x), is right-differentiable at c.
- This one-sided manipulation assumption allows identification of the proportion of always-assigned units.
When always-assigned units exist, the RD design effectively becomes fuzzy, since: 1. Some potentially-assigned units receive treatment while others do not. 2. Always-assigned units are always treated (or always untreated).
27.5.6.2 Estimating Treatment Effects
For potentially-assigned units, the causal effect of interest is:
\Gamma = E[Y(1) - Y(0) | X = c, D^+ > D^-, M = 0].
This parameter represents the local average treatment effect for potentially-assigned compliers, i.e., those whose treatment status is affected by their running variable crossing the cutoff.
27.5.6.3 Bounding Treatment Effects
The approach to bounding treatment effects consists of two key steps:
-
Estimating the Proportion of Always-Assigned Units:
- This is done by measuring the discontinuity in the density of X at c.
- The larger the discontinuity, the greater the fraction of always-assigned units.
-
Computing Worst-Case Bounds on Treatment Effects:
- If manipulation exists, treatment effects must be inferred using extreme-case scenarios.
- For sharp RD designs, bounds are estimated by trimming extreme outcomes near the cutoff.
- For fuzzy RD designs, additional adjustments are required to account for the presence of always-assigned units.
Extensions of this approach use covariates and economic behavior assumptions to refine bounds further.
Manipulation-Robust RD | Doughnut-Hole RD |
---|---|
Uses actual observed data at the cutoff. | Excludes observations near the cutoff. |
Provides a direct estimate of causal effects. | Relies on extrapolation from other regions. |
Accounts for manipulation explicitly. | Assumes a hypothetical counterfactual world. |
Less sensitive to assumptions about manipulation. | Requires strong assumptions about bias. |
27.5.6.4 Identification Challenges
A central challenge in manipulation-robust RD designs is the inability to directly distinguish always-assigned from potentially-assigned units. As a result, the LATE \Gamma is not point identified. Instead, we establish sharp bounds on \Gamma.
These bounds leverage the stochastic dominance of potential outcome cumulative distribution functions over observed distributions. This allows us to infer treatment effects without making strong parametric assumptions.
To formalize the population structure, we define five types of units:
-
Potentially-Assigned Units:
- C_0 (Compliers): Receive treatment if and only if X \geq c.
- A_0 (Always-Takers): Always receive treatment, regardless of X.
- N_0 (Never-Takers): Never receive treatment, regardless of X.
-
Always-Assigned Units:
- T_1 (Treated Always-Assigned Units): Always appear above the cutoff and receive treatment.
- U_1 (Untreated Always-Assigned Units): Always appear below the cutoff and do not receive treatment.
The measure \tau, representing the proportion of always-assigned units near the cutoff, is point-identified using the discontinuity in the density of the running variable f_X at c.
27.5.6.4.1 Identification in Sharp RD
In a sharp RD design:
- Units to the left of the cutoff are potentially-assigned units.
- The observed distribution of untreated outcomes, Y(0), among these units corresponds to the outcomes of potentially-assigned compliers (C_0) at the cutoff.
- To estimate sharp bounds on \Gamma, we need to assess the distribution of treated outcomes (Y(1)) for compliers.
However, information on treated outcomes (Y(1)) at the cutoff is only available from the treated subpopulation, which includes:
Potentially-assigned compliers (C_0)
Always-assigned treated units (T_1)
Since \tau is point-identified, we can construct sharp bounds on \Gamma by adjusting for the presence of T_1.
27.5.6.4.2 Identification in Fuzzy RD
In fuzzy RD, treatment assignment is not deterministic. The subpopulations observed in each treatment-status group at the cutoff are:
Subpopulation | Unit Types Present |
---|---|
X = c^+, D = 1 | C_0, A_0, T_1 |
X = c^-, D = 1 | A_0 |
X= c^+, D = 0 | N_0, U_1 |
X = c^-, D = 0 | C_0, N_0 |
Source: Table on page 848 of Gerard, Rokkanen, and Rothe (2020)
Key Takeaways:
- Unit Types and Combinations: There are five distinct unit types and four combinations of treatment assignments and decisions relevant to the analysis. These distinctions are important because they affect how potential outcomes are analyzed and bounded.
- Outcome Distributions: The analysis involves estimating the distribution of potential outcomes (both treated and untreated) among potentially-assigned compliers at the cutoff.
- The goal is to estimate the distribution of potential outcomes (both treated and untreated) for potentially-assigned compliers at the cutoff.
27.5.6.5 Three-Step Process for Bounding Treatment Effects
The method to obtain sharp bounds on \Gamma follows three steps:
- Bounding Potential Outcomes Under Treatment:
- Use observed treated outcomes to estimate the upper and lower bounds on F_{Y(1)}(y | X = c, M = 0).
- Bounding Potential Outcomes Under Non-Treatment:
- Use observed untreated outcomes to estimate the upper and lower bounds on F_{Y(0)}(y | X = c, M = 0).
- Deriving Bounds on \Gamma:
- Using the bounds from Steps 1 and 2, compute sharp upper and lower bounds on the local average treatment effect.
Extreme Value Consideration:
- The bounds account for worst-case scenarios by considering extreme assumptions about the distribution of potential outcomes.
- These bounds are sharp (i.e., they cannot be tightened further without additional assumptions) but remain empirically relevant.
27.5.6.6 Extensions
- Quantile Treatment Effects (QTEs)
An alternative to average treatment effects is the quantile treatment effect (QTE), which focuses on different percentiles of the outcome distribution. Advantages of QTE bounds include:
- Less Sensitivity to Extreme Values: Unlike ATE bounds, QTE bounds are less affected by outliers in the outcome distribution.
- More Informative for Policy Analysis: Helps determine whether effects are concentrated in certain segments of the population.
QTE vs. ATE Under Manipulation:
- ATE inference is highly sensitive to manipulation, with confidence intervals widening significantly as assumed manipulation increases.
- QTE inference remains meaningful even under substantial manipulation.
- Discrete Outcome Extensions: The framework applies not only to continuous outcomes but also to discrete outcome variables.
- Role of Behavioral Assumptions
- Making behavioral assumptions about high treatment likelihood among always-assigned units can refine the bounds.
- For example, assuming that most always-assigned units are treated allows for narrower bounds on treatment effects.
- Incorporation of Covariates
- Including pre-treatment covariates can refine treatment effect bounds.
- Covariates help:
- Distinguish between potentially-assigned and always-assigned units.
- Improve inference on treatment effect heterogeneity.
- Guide policy targeting by identifying unit types based on observed characteristics.
devtools::install_github("francoisgerard/rdbounds/R")
library(formattable)
library(data.table)
library(rdbounds)
set.seed(123)
df <- rdbounds_sampledata(1000, covs = FALSE)
#> [1] "True tau: 0.117999815082062"
#> [1] "True treatment effect on potentially-assigned: 2"
#> [1] "True treatment effect on right side of cutoff: 2.35399944524618"
head(df)
#> x y treatment
#> 1 -1.2532616 2.684827 0
#> 2 -0.5146925 5.845219 0
#> 3 3.4853777 6.166070 0
#> 4 0.1576616 3.227139 0
#> 5 0.2890962 7.031685 1
#> 6 3.8350019 10.238570 1
rdbounds_est <-
rdbounds(
y = df$y,
x = df$x,
# covs = as.factor(df$cov),
treatment = df$treatment,
c = 0,
discrete_x = FALSE,
discrete_y = FALSE,
bwsx = c(.2, .5),
bwy = 1,
# for median effect use
# type = "qte",
# percentiles = .5,
kernel = "epanechnikov",
orders = 1,
evaluation_ys = seq(from = 0, to = 15, by = 1),
refinement_A = TRUE,
refinement_B = TRUE,
right_effects = TRUE,
yextremes = c(0, 15),
num_bootstraps = 5
)
#> [1] "The proportion of always-assigned units just to the right of the cutoff is estimated to be 0.38047"
#> [1] "2025-03-18 11:06:09.865241 Estimating CDFs for point estimates"
#> [1] "2025-03-18 11:06:10.167795 .....Estimating CDFs for units just to the right of the cutoff"
#> [1] "2025-03-18 11:06:12.38372 Estimating CDFs with nudged tau (tau_star)"
#> [1] "2025-03-18 11:06:12.429188 .....Estimating CDFs for units just to the right of the cutoff"
#> [1] "2025-03-18 11:06:16.48029 Beginning parallelized output by bootstrap.."
#> [1] "2025-03-18 11:06:20.764061 Computing Confidence Intervals"
#> [1] "2025-03-18 11:06:33.707395 Time taken:0.4 minutes"
rdbounds_summary(rdbounds_est, title_prefix = "Sample Data Results")
#> [1] "Time taken: 0.4 minutes"
#> [1] "Sample size: 1000"
#> [1] "Local Average Treatment Effect:"
#> $tau_hat
#> [1] 0.3804665
#>
#> $tau_hat_CI
#> [1] 0.4180576 1.4333618
#>
#> $takeup_increase
#> [1] 0.6106973
#>
#> $takeup_increase_CI
#> [1] 0.4332132 0.7881814
#>
#> $TE_SRD_naive
#> [1] 1.589962
#>
#> $TE_SRD_naive_CI
#> [1] 1.083977 2.095948
#>
#> $TE_SRD_bounds
#> [1] 0.7165801 2.3454868
#>
#> $TE_SRD_CI
#> [1] -1.200992 7.733214
#>
#> $TE_SRD_covs_bounds
#> [1] NA NA
#>
#> $TE_SRD_covs_CI
#> [1] NA NA
#>
#> $TE_FRD_naive
#> [1] 2.583271
#>
#> $TE_FRD_naive_CI
#> [1] 1.506573 3.659969
#>
#> $TE_FRD_bounds
#> [1] 1.273896 3.684860
#>
#> $TE_FRD_CI
#> [1] -1.266341 10.306116
#>
#> $TE_FRD_bounds_refinementA
#> [1] 1.273896 3.684860
#>
#> $TE_FRD_refinementA_CI
#> [1] -1.266341 10.306116
#>
#> $TE_FRD_bounds_refinementB
#> [1] 1.420594 3.677963
#>
#> $TE_FRD_refinementB_CI
#> [1] NA NA
#>
#> $TE_FRD_covs_bounds
#> [1] NA NA
#>
#> $TE_FRD_covs_CI
#> [1] NA NA
#>
#> $TE_SRD_CIs_manipulation
#> [1] NA NA
#>
#> $TE_FRD_CIs_manipulation
#> [1] NA NA
#>
#> $TE_SRD_right_bounds
#> [1] -2.056178 3.650820
#>
#> $TE_SRD_right_CI
#> [1] -11.044937 9.282217
#>
#> $TE_FRD_right_bounds
#> [1] -2.900703 5.272370
#>
#> $TE_FRD_right_CI
#> [1] -14.75421 15.58640
rdbounds_est_tau <-
rdbounds(
y = df$y,
x = df$x,
# covs = as.factor(df$cov),
treatment = df$treatment,
c = 0,
discrete_x = FALSE,
discrete_y = FALSE,
bwsx = c(.2, .5),
bwy = 1,
kernel = "epanechnikov",
orders = 1,
evaluation_ys = seq(from = 0, to = 15, by = 1),
refinement_A = TRUE,
refinement_B = TRUE,
right_effects = TRUE,
potential_taus = c(.025, .05, .1, .2),
yextremes = c(0, 15),
num_bootstraps = 5
)
#> [1] "The proportion of always-assigned units just to the right of the cutoff is estimated to be 0.38047"
#> [1] "2025-03-18 11:06:35.36151 Estimating CDFs for point estimates"
#> [1] "2025-03-18 11:06:35.578808 .....Estimating CDFs for units just to the right of the cutoff"
#> [1] "2025-03-18 11:06:37.519459 Estimating CDFs with nudged tau (tau_star)"
#> [1] "2025-03-18 11:06:37.590958 .....Estimating CDFs for units just to the right of the cutoff"
#> [1] "2025-03-18 11:06:40.652474 Beginning parallelized output by bootstrap.."
#> [1] "2025-03-18 11:06:45.388902 Estimating CDFs with fixed tau value of: 0.025"
#> [1] "2025-03-18 11:06:45.460836 Estimating CDFs with fixed tau value of: 0.05"
#> [1] "2025-03-18 11:06:45.531103 Estimating CDFs with fixed tau value of: 0.1"
#> [1] "2025-03-18 11:06:45.590547 Estimating CDFs with fixed tau value of: 0.2"
#> [1] "2025-03-18 11:06:46.638509 Beginning parallelized output by bootstrap x fixed tau.."
#> [1] "2025-03-18 11:06:50.52083 Computing Confidence Intervals"
#> [1] "2025-03-18 11:07:03.385809 Time taken:0.47 minutes"
causalverse::plot_rd_aa_share(rdbounds_est_tau) # For SRD (default)

# causalverse::plot_rd_aa_share(rdbounds_est_tau, rd_type = "FRD") # For FRD
27.6 Fuzzy Regression Discontinuity Design
A Fuzzy Regression Discontinuity Design occurs when the assignment rule at the cutoff does not perfectly determine treatment status but instead causes a discontinuity in the probability of treatment. Unlike a Sharp RD Design, where crossing the threshold fully determines treatment, in a fuzzy RD, some individuals on both sides of the threshold may or may not receive the treatment.
If treatment is not strictly assigned at the cutoff, the usual RD estimator (which assumes deterministic assignment) is not valid. Instead, we use the cutoff as an instrumental variable to estimate the treatment effect for compliers, i.e., individuals whose treatment status depends on whether they cross the threshold.
Define an indicator variable Z_i (i.e., instrument for treatment assignment) that captures whether an individual is above or below the cutoff:
Z_i = \begin{cases} 1 & \text{if } X_i \geq c \\ 0 & \text{if } X_i < c \end{cases}
This variable Z_i serves as an instrument for the treatment variable D_i because:
It strongly correlates with treatment (D_i).
It is exogenous, meaning it affects the outcome only through its effect on treatment.
27.6.1 Compliance Types
Since treatment assignment is no longer deterministic, individuals can be classified into four groups based on how they respond to the cutoff:
- Compliers (C_0): Individuals who receive treatment if and only if X_i \geq c.
- Always-Takers (A_0): Individuals who always receive treatment, regardless of whether X_i \geq c.
- Never-Takers (N_0): Individuals who never receive treatment, even if X_i \geq c.
- Defiers (violating monotonicity, assumed to be zero): Individuals who receive treatment if X_i < c but not if X_i \geq c.
The Fuzzy RD estimator identifies the treatment effect only for compliers, because their treatment status depends on Z_i.
27.6.2 Estimating the Local Average Treatment Effect
We estimate the LATE using a ratio of discontinuities:
\text{LATE} = \frac{\lim\limits_{x \downarrow c}E[Y|X = x] - \lim\limits_{x \uparrow c}E[Y|X = x]}{\lim\limits_{x \downarrow c } E[D |X = x] - \lim\limits_{x \uparrow c}E[D |X=x]}
Intuitively, this formula represents:
\text{LATE} = \frac{\text{Discontinuity in } E[Y|X]}{\text{Discontinuity in } E[D|X]}
where:
The numerator captures the jump in the expected outcome at the cutoff.
The denominator captures the jump in the probability of treatment at the cutoff.
This ratio is valid under three key assumptions:
- Continuity in potential outcomes: E[Y(d)|X] is continuous at X = c for both d \in \{0,1\}.
- Monotonicity: There are no defiers (P(D^+ \geq D^- | X = c) = 1).
- First-stage relevance: There is a discontinuity in P(D = 1 | X) at X = c.
If these conditions hold, the fuzzy RD estimator gives a valid estimate of the causal effect of treatment for compliers.
27.6.3 Equivalent Representation Using Expectations
We can also define LATE in terms of conditional expectations of treatment and outcome:
\lim_{\epsilon \to 0} \frac{E[Y |Z = 1] - E[Y |Z=0]}{E[D|Z = 1] - E[D|Z = 0]}
where Z is the instrument (indicator for being above the cutoff). This approach highlights the IV nature of fuzzy RD.
27.6.4 Estimation Strategies
There are two equivalent ways to estimate the LATE in practice:
Approach 1: Two-Step Estimation
-
Estimate Sharp RD for the Outcome Y:
- Regress Y on X using a local linear regression on either side of c.
- Estimate the discontinuity in E[Y|X] at c.
-
Estimate Sharp RD for the Treatment D:
- Regress D on X using a local linear regression on either side of c.
- Estimate the discontinuity in E[D|X] at c.
-
Compute the Ratio:
- Divide the estimated discontinuity in E[Y|X] by the estimated discontinuity in E[D|X].
Mathematically:
\widehat{\text{LATE}} = \frac{\widehat{E[Y | X = c^+]} - \widehat{E[Y | X = c^-]}}{\widehat{E[D | X = c^+]} - \widehat{E[D | X = c^-]}}.
Approach 2: Instrumental Variables Regression
Subset the data to observations close to c.
-
Use Z_i (above/below cutoff indicator) as an instrument for D_i in a two-stage least squares regression:
-
First-stage regression (predicting treatment using the cutoff indicator):
D_i = \alpha + \beta Z_i + \gamma X_i + \epsilon_i
- This captures the effect of the cutoff on treatment assignment.
-
Second-stage regression (estimating treatment effect using predicted D_i):
Y_i = \delta + \tau \widehat{D}_i + \lambda X_i + \nu_i
- The coefficient \tau gives the LATE estimate.
-
27.6.5 Practical Considerations
- Bandwidth Selection: Only observations near the cutoff should be used. Methods like cross-validation or Calonico, Cattaneo, and Farrell (2020) optimal bandwidth selection can help.
- Polynomial Order: A local linear model is typically preferred, but higher-order polynomials may be used cautiously.
- Robust Inference: Standard errors should be computed using heteroskedasticity-robust and clustered standard errors if necessary.
27.6.6 Steps for Fuzzy RD
1. Visualization
- Graph the Outcome Variable:
- Compute the average outcome within bins of the running variable X_i.
- Choose bins large enough to display smooth trends but small enough to reveal discontinuities at the cutoff.
- Overlay a smoothed regression line on either side of the cutoff to visualize any jumps.
- Graph the Probability of Treatment:
- Compute the average treatment probability within the same bins.
- Plot E[D|X] to check for a discontinuity at X = c, confirming the first-stage relevance of the instrument.
2. Estimation of Treatment Effect
Use Two-Stage Least Squares to estimate the Local Average Treatment Effect:
-
First Stage (Predict Treatment Using Cutoff Indicator Z_i):
D_i = \alpha + \beta Z_i + \gamma X_i + \epsilon_i
- This regression captures how treatment probability changes at the cutoff.
- The coefficient \beta measures the jump in treatment probability at X = c.
-
Second Stage (Estimate Outcome Using Predicted Treatment):
Y_i = \delta + \tau \widehat{D}_i + \lambda X_i + \nu_i
- The coefficient \tau gives the LATE, which estimates the treatment effect for compliers.
3. Robustness Checks
-
Assess Possible Jumps in Other Covariates:
- Check whether other pre-determined covariates (e.g., age, income) exhibit discontinuities at the cutoff.
- If covariates jump, this may indicate endogenous sorting or omitted variable bias.
-
Hypothesis Testing for Bunching (McCrary Test):
- Test for manipulation of the running variable by examining whether the density of X_i changes discontinuously at c.
- A significant density jump suggests sorting behavior, which could invalidate RD assumptions.
-
Placebo Tests:
- Repeat the analysis at fake cutoffs (values of X where no intervention occurs).
- If a treatment effect appears at a placebo cutoff, this suggests a spurious RD effect.
-
Varying Bandwidth Sensitivity:
- Re-run the analysis using different bandwidths around the cutoff.
- Check whether estimates remain stable as the window narrows or expands.
27.7 Sharp Regression Discontinuity Design
A Sharp Regression Discontinuity Design occurs when treatment assignment follows a strict rule at a known cutoff. That is, units receive treatment if and only if their running variable X_i crosses a threshold c:
D_i = \begin{cases} 1 & \text{if } X_i \geq c \\ 0 & \text{if } X_i < c \end{cases}
Unlike Fuzzy RD, where treatment probability changes discontinuously but is not deterministic, Sharp RD ensures perfect compliance with the cutoff rule.
The key idea is that units just below and just above the cutoff are nearly identical in expectation, except for their treatment status. This mimics randomized experiments in a local neighborhood around X = c.
27.7.1 Assumptions for Identification
For a valid Sharp RD design, we assume:
-
Continuity of the Conditional Expectation of Potential Outcomes
- The expected outcome given X is smooth at c in the absence of treatment: \lim_{x \uparrow c} E[Y(0) | X = x] = \lim_{x \downarrow c} E[Y(0) | X = x].
- This ensures that any observed discontinuity in E[Y | X] at X = c is due to treatment, not pre-existing differences.
-
No Manipulation of the Running Variable
- Agents cannot perfectly sort themselves around the cutoff (e.g., students manipulating test scores to qualify for a scholarship).
- This is typically checked using the McCrary density test to detect discontinuities in the density of X at c.
-
Local Randomization
- Near the cutoff, individuals are as good as randomly assigned to treatment or control.
If these conditions hold, the Sharp RD estimator provides an unbiased estimate of the causal effect of treatment.
27.7.2 Estimating the Local Average Treatment Effect
The treatment effect at the cutoff is given by:
\tau = \lim_{x \downarrow c}E[Y | X = x] - \lim_{x \uparrow c} E[Y | X = x].
This represents the jump in the expected outcome at the cutoff.
27.7.3 Estimation Methods
A common approach is to estimate separate linear regressions on each side of the cutoff:
For observations below the cutoff (X < c):
Y_i = \alpha + \beta (X_i - c) + \epsilon_i.
For observations above the cutoff (X \geq c):
Y_i = \gamma + \delta (X_i - c) + \tau D_i + \nu_i.
Here, the coefficient \tau captures the treatment effect at X = c.
In practice, we estimate:
\hat{\tau} = \hat{E}[Y | X = c^+] - \hat{E}[Y | X = c^-].
This can be implemented using Weighted Least Squares with observations near the cutoff receiving higher weights.
- Global Polynomial Regression
An alternative approach is to use a polynomial regression:
Y_i = \alpha + \sum_{k=1}^{K} \beta_k (X_i - c)^k + \tau D_i + \epsilon_i.
where:
Higher-order terms (X_i - c)^k capture nonlinear relationships.
Typical choices for K are 2 or 3, but higher orders may lead to overfitting.
- Nonparametric Local Regression
Instead of assuming a linear or polynomial relationship, a local regression (kernel-based) method estimates:
E[Y | X = x] = \sum_{i=1}^{n} K_h (X_i - x) Y_i.
where K_h is a kernel function (e.g., Epanechnikov), and h is the bandwidth.
- A smaller h captures local variation but increases variance.
- A larger h smooths noise but risks bias.
27.7.4 Steps for Sharp RD
- Visualization
-
Graph the outcome variable:
- Compute binned averages of Y_i over intervals of X.
- Choose bin sizes that balance smoothness and clarity.
- Overlay a smoothed regression line on each side of c.
-
Graph the running variable’s density:
- Use histograms to check for manipulation.
- Conduct a McCrary density test to detect discontinuities.
- Estimation of the Treatment Effect
- Run local linear regression separately on both sides of the cutoff.
- Use nonparametric methods (e.g., kernel regression) for robustness.
- Estimate treatment effect using: \hat{\tau} = \hat{E}[Y | X = c^+] - \hat{E}[Y | X = c^-].
- Robustness Checks
- Check for Jumps in Other Covariates
If any pre-determined covariate jumps at the cutoff, it suggests sorting or omitted variable bias.
-
Run RD regressions for each covariate:
W_i = \alpha + \beta (X_i - c) + \gamma D_i + \epsilon_i.
A significant \gamma suggests a violation of continuity assumptions.
- McCrary Density Test (Checking for Manipulation)
- Run the McCrary test to examine whether the density of X_i exhibits a discontinuity at c.
- A significant jump indicates sorting behavior, which can invalidate the RD design.
- Placebo Tests
- Perform fake cutoff tests by estimating RD effects at arbitrary points c^*.
- If significant effects appear at non-cutoff points, the RD design may be picking up spurious trends.
- Varying Bandwidth
- Re-run RD analysis using different bandwidths h.
- If results change dramatically, treatment effects may be highly sensitive to bandwidth choice.
- Use data-driven bandwidth selection methods (e.g., G. Imbens and Kalyanaraman (2012)).
27.8 Regression Kink Design
The Regression Kink Design (RKD) extends the logic of RD by exploiting changes in the slope of the treatment intensity function at a known threshold rather than a discontinuous jump in treatment assignment.
Instead of an RD jump in treatment probability at X = c, the treatment function b(X) exhibits a kink at the cutoff:
- In Sharp RKD, the kink is deterministic, meaning the treatment function b(X) changes its slope exactly at X = c.
- In Fuzzy RKD, treatment assignment remains probabilistic, requiring an instrumental variable approach similar to Fuzzy RD.
Example: Unemployment Benefits
Consider an unemployment insurance program where benefits increase at a diminishing rate as prior earnings increase. The function governing benefits, b(X), exhibits a kink at a threshold X = c. The RKD framework allows us to estimate the marginal causal effect of additional benefits on employment duration.
27.8.1 Identification in Sharp Regression Kink Design
In a Sharp RKD, the treatment intensity function b(X) exhibits a known change in slope at X = c, formally:
D_i = b(X_i), \quad \text{where } b(X) \text{ has a kink at } X = c.
The key identification assumption is that the potential outcome function E[Y(d) | X] is smooth in X. Thus, any observed change in the slope of E[Y | X] at X = c can be attributed to the change in b(X).
The causal effect of interest is:
\alpha_{KRD} = \frac{\lim\limits_{x \downarrow c} \frac{d}{dx}E[Y |X = x]- \lim\limits_{x \uparrow c} \frac{d}{dx}E[Y |X = x]}{\lim\limits_{x \downarrow c} \frac{d}{dx}b(x) - \lim\limits_{x \uparrow c} \frac{d}{dx}b(x)}.
where:
b(X) is a known function determining treatment intensity.
The numerator captures the discontinuous change in the slope of the expected outcome at X = c.
The denominator captures the change in the slope of the treatment function at X = c.
If b(X) is known and deterministic, the denominator is non-random, allowing for precise estimation of \alpha_{KRD}.
Assumptions for Identification
-
Continuity of Potential Outcomes
- The expected potential outcomes E[Y(d)|X] are smooth in X (no jumps).
-
No Manipulation of the Running Variable
- The density of X is continuous at X = c, implying that agents cannot sort themselves based on the kink.
-
First-Stage Validity
- The slope of b(X) must change at X = c (i.e., the kink must exist).
If these assumptions hold, \alpha_{KRD} represents the marginal causal effect of treatment intensity on the outcome.
27.8.2 Identification in Fuzzy Regression Kink Design
In Fuzzy RKD, the treatment function D_i does not directly follow a deterministic function b(X) but instead exhibits a kink in its probability distribution:
E[D | X] \text{ has a kink at } X = c.
The treatment intensity function is unknown, requiring an instrumental variable (IV) strategy, analogous to Fuzzy RD.
The causal effect is given by:
\alpha_{KRD} = \frac{\lim\limits_{x \downarrow c} \frac{d}{dx}E[Y |X = x]- \lim\limits_{x \uparrow c} \frac{d}{dx}E[Y |X = x]}{\lim\limits_{x \downarrow c} \frac{d}{dx}E[D |X = x]- \lim\limits_{x \uparrow c} \frac{d}{dx}E[D |X = x]}.
where:
The numerator measures the kink in the expected outcome.
The denominator measures the kink in the probability of treatment.
The ratio provides a local instrumental variable estimate of the causal effect for compliers.
Identification Assumptions
In addition to the Sharp RKD assumptions, Fuzzy RKD requires:
-
Monotonicity
- No individuals decrease their treatment intensity while others increase at the kink (analogous to Fuzzy RD monotonicity).
-
Relevance of the Kink
- There must be a statistically significant slope change in E[D | X] at X = c.
If these assumptions hold, the Fuzzy RKD estimator identifies a local treatment effect.
27.8.3 Estimation of RKD Effects
RKD estimation involves three main steps:
Step 1: Estimating the Kink in the Outcome Function
Estimate the left- and right-hand derivatives of E[Y | X]:
\frac{d}{dx}E[Y | X] = \lim_{h \to 0} \frac{E[Y | X = c + h] - E[Y | X = c - h]}{h}.
This can be done using:
Local linear regression on either side of the kink.
Higher-order polynomial regression for improved flexibility.
Step 2: Estimating the Kink in the Treatment Function
For Sharp RKD, the kink in b(X) is known.
For Fuzzy RKD, estimate the kink in E[D | X]:
\frac{d}{dx}E[D | X] = \lim_{h \to 0} \frac{E[D | X = c + h] - E[D | X = c - h]}{h}.
Use local regression or piecewise polynomials to estimate this slope.
Step 3: Compute RKD Estimator
For Sharp RKD:
\hat{\alpha}_{KRD} = \frac{\hat{\tau}_Y}{\tau_b},
where:
\hat{\tau}_Y is the estimated kink in E[Y | X].
\tau_b is the known slope change in b(X).
For Fuzzy RKD:
\hat{\alpha}_{KRD} = \frac{\hat{\tau}_Y}{\hat{\tau}_D}.
where:
- \hat{\tau}_D is the estimated kink in E[D | X].
27.8.4 Robustness Checks
-
Assess Covariate Smoothness
- Verify that pre-determined covariates (e.g., age, education) do not exhibit kinks at X = c.
-
Check for Manipulation of the Running Variable
- Perform a McCrary test to ensure the density of X is continuous at X = c.
-
Placebo Kinks
- Test for spurious kinks at other arbitrary values of X.
-
Bandwidth Sensitivity
- Estimate RKD effects with varying bandwidths to check for consistency.
27.9 Multi-Cutoff Regression Discontinuity Design
The Multi-Cutoff Regression Discontinuity Design extends the standard RD framework by allowing for multiple cutoff points across different groups or geographic regions. Instead of a single threshold c, different subgroups are assigned different cutoffs C_i. This framework allows for a heterogeneous treatment effect function:
\tau (x,c)= E[Y_{1i} - Y_{0i}|X_i = x, C_i = c].
Why Use Multi-Cutoff RD?
- Policy Variation: Policies often implement different cutoffs across regions or institutions (e.g., different states setting different minimum test scores for scholarship eligibility).
- Generalizability: Allows estimation of treatment effects across multiple populations instead of relying on a single threshold.
- Improved Precision: Leveraging multiple thresholds can enhance statistical power compared to a single-cutoff RD.
The multi-cutoff RD framework provides several advantages:
-
Estimation of Local Heterogeneous Effects
- Unlike standard RD, which estimates a single treatment effect, multi-cutoff RD allows heterogeneity in effects across groups.
-
Improved Precision
- More observations across different thresholds can increase statistical power.
-
Policy Implications
- Useful in settings where policy thresholds vary (e.g., different states setting different income eligibility limits for welfare programs).
27.9.1 Identification
Under the potential outcomes framework, each unit i has:
A running variable X_i.
A cutoff specific to their group C_i.
A binary treatment indicator:
D_i = I(X_i \geq C_i).
The observed outcome is:
Y_i = D_i Y_{1i} + (1 - D_i) Y_{0i}.
The treatment effect is the expected difference in potential outcomes:
\tau(x, c) = E[Y_{1i} - Y_{0i} | X_i = x, C_i = c].
27.9.2 Key Assumptions
To ensure causal identification, we extend the standard RD assumptions:
-
Continuity of Potential Outcomes
- The expected potential outcomes E[Y(0)|X] and E[Y(1)|X] are smooth functions of X at each cutoff C_i.
- Formally: \lim_{x \uparrow C_i} E[Y(0)|X=x, C_i=c] = \lim_{x \downarrow C_i} E[Y(0)|X=x, C_i=c].
-
No Manipulation of the Running Variable
- The density of X_i must be continuous at each C_i, ensuring that individuals cannot selectively sort above or below their assigned cutoff.
-
Local Randomization
- Near each cutoff, units are as-good-as-randomly assigned to treatment or control.
-
Independence Across Cutoffs
- The cutoff assignment rule should be exogenous and not correlated with unobserved determinants of Y.
If these assumptions hold, each cutoff provides a valid local treatment effect estimate.
27.9.3 Estimation Approaches
27.9.3.1 Pooling Cutoffs with Fixed Effects
A straightforward way to estimate multi-cutoff RD is to include cutoff fixed effects:
Y_i = \alpha + \beta (X_i - C_i) + \tau D_i + \gamma C_i + \epsilon_i.
where:
\tau captures the average treatment effect across all cutoffs.
C_i is included as a fixed effect to account for different intercepts across groups.
27.9.3.2 Separate RD Estimation for Each Cutoff
Instead of pooling, we can estimate separate RD effects for each C_i:
\tau_c = \lim_{x \downarrow C_i}E[Y|X = x, C_i = c] - \lim_{x \uparrow C_i} E[Y|X = x, C_i = c].
This approach allows for heterogeneous treatment effects.
27.9.3.3 Interaction Model for Heterogeneous Effects
To estimate how treatment effects vary with C_i, we interact D_i with C_i:
Y_i = \alpha + \beta (X_i - C_i) + \tau D_i + \lambda D_i C_i + \epsilon_i.
where:
\lambda captures how the treatment effect varies with the cutoff.
A significant \lambda implies that \tau(x, c) is not constant across cutoffs.
27.9.3.4 Nonparametric Local Estimation
A fully flexible approach estimates \tau(x, c) separately at each cutoff using kernel-based methods:
\hat{\tau}(c) = \frac{\sum_{i=1}^{n} K_h (X_i - C_i) D_i Y_i}{\sum_{i=1}^{n} K_h (X_i - C_i) D_i} - \frac{\sum_{i=1}^{n} K_h (X_i - C_i) (1 - D_i) Y_i}{\sum_{i=1}^{n} K_h (X_i - C_i) (1 - D_i)}.
where:
K_h(\cdot) is a kernel function (e.g., Epanechnikov).
h is the bandwidth, chosen via cross-validation.
27.9.4 Robustness Checks
- Covariate Balance at Each Cutoff
Test whether pre-treatment covariates show jumps at each C_i.
-
Run placebo RD regressions on covariates:
W_i = \alpha + \beta (X_i - C_i) + \gamma D_i + \epsilon_i.
A significant \gamma suggests that RD assumptions are violated.
- McCrary Density Test
-
Perform a McCrary test separately at each cutoff to check for manipulation:
f(X) \text{ should be continuous at } X = C_i.
If discontinuities exist, individuals may be sorting around cutoffs.
- Placebo Cutoffs
- Implement fake cutoffs and re-estimate \tau(x, c).
- If significant effects appear, the RD estimates may be biased.
- Varying Bandwidths
- Re-estimate treatment effects using different bandwidths.
- If \hat{\tau}(x,c) changes drastically, it suggests sensitivity to bandwidth choice.
27.10 Multi-Score Regression Discontinuity Design
The Multi-Score Regression Discontinuity Design extends the standard single-score RD and the multi-cutoff RD by introducing multiple running variables that simultaneously determine treatment eligibility. Instead of relying on a single threshold for assignment, treatment now depends on a combination of multiple continuous scores crossing predetermined cutoffs.
Multi-score RD is relevant when policy eligibility is based on multiple criteria, such as:
Education: Honors program admission based on both math and English scores.
Healthcare: Medical trial eligibility based on both BMI and blood pressure levels.
Taxation: Tax incentives based on income level and household size.
27.10.1 General Framework
Each individual i has:
Two running variables, X_{1i} and X_{2i}.
Two predetermined cutoffs, C_1 and C_2.
A binary treatment indicator D_i, assigned based on whether the individual’s scores exceed both thresholds.
The treatment effect is defined as:
\tau (x_1, x_2) = E[Y_{1i} - Y_{0i} | X_{1i} = x_1, X_{2i} = x_2].
This represents the local average treatment effect in a two-dimensional RD setting.
27.10.2 Identification
Under the potential outcomes framework, for each individual i, we define:
Y_{1i}: Potential outcome under treatment.
Y_{0i}: Potential outcome under control.
D_i: Treatment assignment rule.
The observed outcome is:
Y_i = D_i Y_{1i} + (1 - D_i) Y_{0i}.
The treatment assignment mechanism follows:
D_i = \begin{cases} 1 & \text{if } X_{1i} \geq C_1 \text{ and } X_{2i} \geq C_2, \\ 0 & \text{otherwise}. \end{cases}
27.10.3 Key Assumptions
To ensure valid causal inference, the multi-score RD framework extends the standard RD assumptions:
-
Continuity of Potential Outcomes in Both Running Variables
- The expected potential outcomes E[Y(0) | X_1, X_2] and E[Y(1) | X_1, X_2] are smooth in both X_1 and X_2.
- Formally: \lim_{(x_1, x_2) \to (C_1, C_2)^-} E[Y(0) | X_1 = x_1, X_2 = x_2] = \lim_{(x_1, x_2) \to (C_1, C_2)^+} E[Y(0) | X_1 = x_1, X_2 = x_2].
- Ensures that any observed discontinuity in E[Y | X_1, X_2] is attributable to treatment.
-
No Manipulation of Running Variables
- The density of (X_1, X_2) must be continuous at (C_1, C_2).
- No agents should be able to precisely manipulate both scores to cross the threshold.
-
Local Randomization
- Near (C_1, C_2), units are as good as randomly assigned to treatment or control.
-
No Interaction Effects in Running Variables (optional)
- In some models, we assume that the effect of crossing C_1 does not depend on C_2 and vice versa.
If these assumptions hold, the treatment effect is identified as the discontinuity in E[Y | X_1, X_2] at (C_1, C_2).
27.10.4 Estimation Approaches
27.10.4.1 Local Linear Regression in Two Dimensions
The simplest approach is to estimate separate regressions on each side of the cutoff in both dimensions:
For observations below the threshold (C_1, C_2):
Y_i = \alpha + \beta_1 (X_{1i} - C_1) + \beta_2 (X_{2i} - C_2) + \epsilon_i.
For observations above the threshold (C_1, C_2):
Y_i = \gamma + \delta_1 (X_{1i} - C_1) + \delta_2 (X_{2i} - C_2) + \tau D_i + \nu_i.
The treatment effect \tau is estimated as:
\hat{\tau} = \hat{E}[Y | X_1 = C_1^+, X_2 = C_2^+] - \hat{E}[Y | X_1 = C_1^-, X_2 = C_2^-].
This approach assumes local linearity, but higher-order polynomials can be used:
Y_i = \alpha + \sum_{k=1}^{K} \beta_k (X_{1i} - C_1)^k + \sum_{k=1}^{K} \gamma_k (X_{2i} - C_2)^k + \tau D_i + \epsilon_i.
27.10.4.2 Kernel-Weighted Estimation
A more flexible approach estimates \tau(x_1, x_2) using nonparametric local regression:
\hat{\tau}(x_1, x_2) = \frac{\sum_{i=1}^{n} K_h (X_{1i} - x_1) K_h (X_{2i} - x_2) D_i Y_i}{\sum_{i=1}^{n} K_h (X_{1i} - x_1) K_h (X_{2i} - x_2) D_i} - \frac{\sum_{i=1}^{n} K_h (X_{1i} - x_1) K_h (X_{2i} - x_2) (1 - D_i) Y_i}{\sum_{i=1}^{n} K_h (X_{1i} - x_1) K_h (X_{2i} - x_2) (1 - D_i)}.
where:
K_h(\cdot) is a kernel function (e.g., Epanechnikov).
h is the bandwidth, selected via cross-validation.
27.10.4.3 Interaction Model for Heterogeneous Effects
To assess interaction effects between running variables, estimate:
Y_i = \alpha + \beta_1 (X_{1i} - C_1) + \beta_2 (X_{2i} - C_2) + \tau D_i + \lambda D_i (X_{1i} - C_1)(X_{2i} - C_2) + \epsilon_i.
- \lambda captures whether the treatment effect depends on both X_1 and X_2.
27.10.5 Robustness Checks
-
Covariate Balance in Both Dimensions
- Test whether pre-treatment covariates jump at (C_1, C_2).
-
McCrary Density Test in Two Dimensions
- Verify that density of (X_1, X_2) is smooth at (C_1, C_2).
-
Placebo Cutoffs
- Implement fake cutoffs and re-estimate \tau(x_1, x_2).
-
Varying Bandwidths
- Re-estimate using different bandwidths for robustness.
27.11 Evaluation of a Regression Discontinuity Design
After estimating an RD model, it is crucial to evaluate whether the assumptions hold and whether the results are robust to different specifications. The key aspects of RD evaluation include:
- Graphical and formal evidence for a discontinuity in treatment and outcome variables.
-
Validation of RD assumptions, including:
- The absence of discontinuities in pre-treatment covariates.
- No manipulation of the assignment variable.
- Robustness checks for functional form and bandwidth choice.
- External validity: assessing whether results generalize beyond the cutoff.
A well-implemented RD should demonstrate a clear treatment effect at the cutoff while ensuring that no other discontinuous changes confound the effect.
27.11.1 Graphical and Formal Evidence
27.11.1.1 Visual Inspection of the RD Effect
A fundamental step in RD analysis is to plot the outcome variable against the running variable:
- Compute binned averages of Y_i for small intervals of X_i.
- Overlay a smoothed polynomial regression separately for X < c and X \geq c.
- A visible jump at X = c provides initial evidence of a treatment effect.
Additionally, plotting treatment probability P(D = 1 | X) ensures that assignment follows the expected RD rule.
27.11.1.2 No Discontinuity in Pre-Treatment Covariates
To rule out omitted variable bias, we check whether other covariates (age, education, prior test scores, etc.) exhibit jumps at the cutoff.
For each covariate W_i, estimate:
W_i = \alpha + f(X_i) \beta + \tau D_i + \epsilon_i.
- If \tau is statistically significant, it suggests a violation of RD assumptions.
- Covariate jumps imply that factors other than treatment may be driving the observed outcome change.
27.11.1.3 Manipulation Test (McCrary Density Test)
A critical assumption in RD is that units cannot precisely manipulate their values of X_i. If individuals can selectively sort around c (e.g., students altering test scores to qualify for a scholarship), RD estimates become invalid.
To test for manipulation, we estimate the density function of X and test for a discontinuity at c:
\hat{f}(X) = \lim_{x \uparrow c} f(X) - \lim_{x \downarrow c} f(X).
A significant difference suggests sorting behavior, which violates RD assumptions.
27.11.2 Functional Form of the Running Variable
27.11.2.1 General RD Model
The most flexible RD specification includes:
A functional form f(X_i) to account for trends.
An indicator for treatment D_i. - An interaction term D_i f(X_i) allowing for different slopes on each side.
Y_i = \alpha_0 + f(X_i) \alpha_1 + I(X_i \geq c) \alpha_2 + f(X_i) I(X_i \geq c) \alpha_3 + u_i.
where:
\alpha_2 captures the treatment effect at X = c.
\alpha_3 tests for differences in slopes across the threshold.
27.11.2.2 Simple Case: Linear RD
If f(X_i) is a linear function, we estimate:
Y_i = \beta_0 + \beta_1 X_i + I(X_i \geq c) \beta_2 + \epsilon_i.
RD gives you \beta_2 (causal effect) of X on Y at the cutoff point
In practice, everyone does
Y_i = \alpha_0 + f(x) \alpha _1 + [I(x_i \ge c)]\alpha_2 + [f(x_i)\times [I(x_i \ge c)]\alpha_3 + u_i
where we estimate different slope on different sides of the line. And if you estimate \alpha_3 to be no different from 0 then we return to the simple case.
27.11.2.3 Higher-Order Polynomials
A more flexible specification allows for nonlinear relationships:
Y_i = \beta_0 + \beta_1 X_i + \beta_2 X_i^2 + \beta_3 X_i^3 + \tau D_i + \epsilon_i.
- Higher-order polynomials reduce bias but increase variance.
- Overfitting is a risk, especially with limited data near c.
27.11.2.4 Nonparametric Estimation
When polynomial models are too restrictive, we use nonparametric local linear regression:
\hat{E}[Y | X] = \sum_{i=1}^{n} K_h(X_i - c) Y_i.
where:
K_h(X) is a kernel function (e.g., Epanechnikov).
h is the bandwidth, chosen to optimize bias-variance tradeoff.
27.11.3 Bandwidth Selection
27.11.3.1 Tradeoff Between Bias and Efficiency
Choosing an appropriate bandwidth h is crucial:
Narrow h (close to c): Lower bias, but high variance.
Wider h: More efficient estimates but potential bias.
27.11.4 Addressing Potential Confounders
27.11.4.1 Multiple Running Variables
If multiple forcing variables influence treatment (e.g., both math and English scores for honors eligibility), failing to account for them may introduce confounding.
A solution is to extend RD to a multi-score framework:
Y_i = \alpha_0 + f(X_{1i}, X_{2i}) \alpha_1 + I(X_{1i} \geq c_1, X_{2i} \geq c_2) \alpha_2 + \epsilon_i.
- Controls for both scores simultaneously.
- Allows for interactions between assignment variables.
27.11.4.2 Bundling of Institutions
In cases where policies change at institutional levels (e.g., different states implementing varying minimum wages), treatment effects may reflect institutional bundling rather than individual cutoff effects.
One solution is to include institution fixed effects:
Y_i = \alpha + f(X_i) \beta + \tau D_i + \gamma C_j + \epsilon_i.
where C_j is a categorical variable for institution j.
27.11.5 External Validity in RD
While RD provides strong internal validity, its generalizability is often limited:
-
Local Nature of RD Estimates
- RD estimates apply only at the cutoff.
- Effects may not extrapolate to other values of X.
-
Heterogeneous Treatment Effects
- If \tau varies across subgroups, RD estimates may not generalize.
-
Spillover Effects
- If treatment effects extend beyond the threshold (e.g., peer effects), RD assumptions may be violated.
27.12 Applications of RD Designs
Regression Discontinuity (RD) designs have widespread applications in empirical research across economics, political science, marketing, and public policy. These applications leverage threshold-based decision rules to identify causal effects in real-world settings where randomized experiments are infeasible.
Key applications include:
Marketing: Estimating the causal effects of promotions and advertising intensity.
Education: Evaluating the impact of financial aid or merit scholarships.
Healthcare: Assessing the effect of medical interventions assigned based on eligibility thresholds.
Labor Economics: Studying unemployment benefits and wage policies.
27.12.1 Applications in Marketing
RD has been widely applied in marketing research to estimate causal effects of pricing, advertising, and product positioning.
- Position Effects in Advertising
(Narayanan and Kalyanam 2015) uses an RD approach to estimate the causal impact of advertisement placement in search engines. Since ad placement follows an auction-based system, a discontinuity occurs where the top-ranked ad receives disproportionately more clicks than lower-ranked ads.
Key insights:
Higher-ranked ads generate more click-throughs, but this does not always translate into higher conversion rates.
The RD framework helps distinguish correlation from causation by leveraging the rank cutoff.
- Identifying Causal Marketing Mix Effects
(Hartmann, Nair, and Narayanan 2011) presents a nonparametric RD estimation approach to identify causal effects of marketing mix variables, such as:
Advertising budgets
Price changes
Promotional campaigns
By comparing firms just above and below an expenditure threshold, the RD framework isolates the true causal impact of marketing spending on consumer behavior.
27.12.2 R Packages for RD Estimation
Key RD Packages in R
Feature | rdd | rdrobust | rddtools |
---|---|---|---|
Estimator | Local linear regression | Local polynomial regression | Local polynomial regression |
Bandwidth Selection | (G. Imbens and Kalyanaraman 2012) | (Calonico, Cattaneo, and Titiunik 2014; G. Imbens and Kalyanaraman 2012 ; Calonico, Cattaneo, and Farrell 2020) | (G. Imbens and Kalyanaraman 2012) |
Kernel Functions | Epanechnikov, Gaussian | Epanechnikov | Gaussian |
Bias Correction | No | Yes | No |
Covariate Inclusion | Yes | Yes | Yes |
Assumption Testing | McCrary Sorting Test | No | McCrary Sorting, Covariate Distribution Tests |
For a detailed comparison, see (Thoemmes, Liao, and Jin 2017) (Table 1, p. 347).
27.12.2.1 Specialized RD Packages
-
rddensity
: Tests for discontinuities in the density of the running variable (useful for manipulation/bunching detection). -
rdlocrand
: Implements randomization-based inference for RD. -
rdmulti
: Extends RD to multiple cutoffs and multiple scores. -
rdpower
: Conducts power calculations and sample selection for RD designs.
Why Use rdrobust
?
- Implements local polynomial regression, providing bias correction and robust standard errors.
- Uses state-of-the-art bandwidth selection methods.
- Produces automatic RD plots for visualization.
27.12.3 Example of Regression Discontinuity in Education
We illustrate a Sharp RD using a simulated example of college GPA and future career success.
Setup:
Students qualify for a prestigious internship if their GPA ≥ 3.5.
The outcome variable is future career success (e.g., salary).
RD estimates the causal impact of the internship program on earnings.
Y_i = \beta_0 + \beta_1 X_i + \beta_2 W_i + u_i
X_i = \begin{cases} 1, W_i \ge c \\ 0, W_i < c \end{cases}
We simulate 100 observations, where GPA is the forcing variable and career success depends on GPA and the treatment effect.
# Set seed for reproducibility
set.seed(42)
n = 100
# Simulate GPA scores (0-4 scale)
GPA <- runif(n, 0, 4)
# Generate future success with treatment effect at GPA >= 3.5
future_success <- 10 + 2 * GPA + 10 * (GPA >= 3.5) + rnorm(n)
# Load RD package
library(rddtools)
# Format data for RD analysis
data <- rdd_data(future_success, GPA, cutpoint = 3.5)
# Plot RD scatterplot
plot(data, col = "red", cex = 0.5, xlab = "GPA", ylab = "Future Success")

We estimate the Sharp RD treatment effect using local linear regression:
# Estimate the sharp RDD model
rdd_mod <- rdd_reg_lm(rdd_object = data, slope = "same")
# Display results
summary(rdd_mod)
#>
#> Call:
#> lm(formula = y ~ ., data = dat_step1, weights = weights)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.08072 -0.54182 0.05352 0.54135 2.51267
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 17.21703 0.20060 85.83 <2e-16 ***
#> D 9.79856 0.31716 30.89 <2e-16 ***
#> x 2.14839 0.09914 21.67 <2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.9284 on 97 degrees of freedom
#> Multiple R-squared: 0.9746, Adjusted R-squared: 0.9741
#> F-statistic: 1863 on 2 and 97 DF, p-value: < 2.2e-16
Plot the RD regression line with binned observations:
# Plot RD regression
plot(rdd_mod, cex = 0.5, col = "red", xlab = "GPA", ylab = "Future Success")

We verify whether results hold under different bandwidths and functional forms:
- Varying the Bandwidth
# Using rdrobust for robust estimation
library(rdrobust)
# Estimate RD with optimal bandwidth
rd_out <- rdrobust(y = future_success, x = GPA, c = 3.5)
summary(rd_out)
#> Sharp RD estimates using local polynomial regression.
#>
#> Number of Obs. 100
#> BW type mserd
#> Kernel Triangular
#> VCE method NN
#>
#> Number of Obs. 83 17
#> Eff. Number of Obs. 7 12
#> Order est. (p) 1 1
#> Order bias (q) 2 2
#> BW est. (h) 0.361 0.361
#> BW bias (b) 0.670 0.670
#> rho (h/b) 0.540 0.540
#> Unique Obs. 83 17
#>
#> =============================================================================
#> Method Coef. Std. Err. z P>|z| [ 95% C.I. ]
#> =============================================================================
#> Conventional 12.836 2.495 5.144 0.000 [7.945 , 17.727]
#> Robust - - 4.473 0.000 [7.677 , 19.653]
#> =============================================================================
- McCrary Test for Manipulation
library(rddensity)
# Check for discontinuities in GPA distribution
rddensity(GPA, c = 3.5)
#> Call:
#> rddensity.
#> Sample size: 100. Cutoff: 3.5.
#> Model: unrestricted. Kernel: triangular. VCE: jackknife
- Polynomial Functional Forms
We compare linear and quadratic specifications:
# Estimate RD with a quadratic polynomial
rdd_mod_quad <- rdd_reg_lm(rdd_object = data, slope = "separate", order = 2)
summary(rdd_mod_quad)
#>
#> Call:
#> lm(formula = y ~ ., data = dat_step1, weights = weights)
#>
#> Residuals:
#> Min 1Q Median 3Q Max
#> -3.06894 -0.52799 0.01642 0.55260 2.45318
#>
#> Coefficients:
#> Estimate Std. Error t value Pr(>|t|)
#> (Intercept) 17.30296 0.32241 53.667 < 2e-16 ***
#> D 9.14439 1.13752 8.039 2.65e-12 ***
#> x 2.29753 0.41955 5.476 3.61e-07 ***
#> `x^2` 0.04253 0.11228 0.379 0.706
#> x_right 2.51976 9.23566 0.273 0.786
#> `x^2_right` -1.61694 17.10035 -0.095 0.925
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> Residual standard error: 0.9376 on 94 degrees of freedom
#> Multiple R-squared: 0.9749, Adjusted R-squared: 0.9736
#> F-statistic: 731 on 5 and 94 DF, p-value: < 2.2e-16
27.12.4 Example of Occupational Licensing and Market Efficiency
Occupational licensing is a form of labor market regulation that requires workers to obtain licenses or certifications before being allowed to work in specific professions. The debate around occupational licensing revolves around two competing effects:
-
Efficiency-enhancing effect:
- Licensing ensures that workers meet minimum quality standards, reducing information asymmetries between firms and consumers.
- Higher-quality service providers are selected into the labor market.
-
Barrier-to-entry effect:
- Licensing requirements impose costs on potential entrants, reducing labor supply.
- This creates frictions in the market, potentially leading to inefficiencies.
Bowblis and Smith (2021) investigates this tradeoff using a Fuzzy RD based on a threshold rule:
Nursing homes with 120 or more beds are required to hire a certain fraction of licensed/certified workers.
The study examines whether this policy improves quality of care or merely restricts labor market competition.
27.12.4.1 OLS Estimation: Naïve Approach and Its Bias
A simple Ordinary Least Squares regression can be used to estimate the effect of licensed workers on quality of service:
Y_i = \alpha_0 + X_i \alpha_1 + LW_i \alpha_2 + \epsilon_i
where:
Y_i = Quality of service at facility i.
LW_i = Proportion of licensed workers at facility i.
X_i = Facility characteristics (e.g., size, staffing levels).
27.12.4.2 Potential Bias in \alpha_2
- Mitigation-Based Bias:
- Facilities with poor quality outcomes may hire more licensed workers in response to bad performance.
- This induces a negative correlation between LW_i and Y_i, biasing \alpha_2 downward.
- Preference-Based Bias:
- Higher-quality facilities may have greater incentives to hire certified workers.
- This induces a positive correlation between LW_i and Y_i, biasing \alpha_2 upward.
27.12.4.3 Fuzzy RD Framework
The OLS model is endogenous, so we implement a Fuzzy RD Design that exploits a discontinuity in licensing requirements at 120 beds.
Y_{ist} = \beta_0 + [I(Bed \geq 121)_{ist}]\beta_1 + f(Size_{ist}) \beta_2 + [f(Size_{ist}) \times I(Bed \geq 121)_{ist}] \beta_3 + X_{it} \delta + \gamma_s + \theta_t + \epsilon_{ist}
where:
I(Bed \geq 121) = Indicator for treatment eligibility (having at least 120 beds).
f(Size_{ist}) = Flexible functional form of facility size.
X_{it} = Other control variables.
\gamma_s = State fixed effects (accounts for state-level regulations).
\theta_t = Time fixed effects (accounts for temporal shocks).
\beta_1 = Intent-to-treat (ITT) effect.
This model estimates discontinuities in quality outcomes at the 120-bed threshold.
27.12.4.4 Why This RD is Fuzzy
Unlike a Sharp RD, treatment is not fully determined at the cutoff:
Some facilities with fewer than 120 beds voluntarily hire licensed workers.
Some facilities with more than 120 beds may not comply fully.
Thus, the RD framework must be modified:
- Fixed-Effect Considerations:
- If states sort differently near the threshold, state fixed effects (\gamma_s) may be needed.
- However, if sorting is non-random, the RD assumption fails.
- Panel Data Implications:
- Fixed-effects models are not preferred in RD, as they are lower in the causal inference hierarchy.
- RD typically excludes pre-treatment periods, whereas fixed-effects require before-and-after comparisons.
- Variation in Facility Size:
- Hospitals rarely change bed capacity, so including hospital-specific fixed effects removes variation necessary for RD estimation.
27.12.4.5 Instrumental Variable Approach
To correct for endogeneity, we use a two-stage least squares IV approach, where the running variable serves as an instrument for treatment intensity.
Stage 1: First-Stage Regression
Estimate the probability of hiring licensed workers based on the 120-bed threshold:
QSW_{ist} = \alpha_0 + [I(Bed \geq 121)_{ist}]\alpha_1 + f(Size_{ist}) \alpha_2 + [f(Size_{ist}) \times I(Bed \geq 121)_{ist}] \alpha_3 + X_{it} \delta + \gamma_s + \theta_t + \epsilon_{ist}
where:
- QSW_{ist} = Predicted proportion of licensed workers at facility i in state s at time t.
Stage 2: Second-Stage Regression
Estimate the causal effect of hiring licensed workers on quality of service:
Y_{ist} = \gamma_0 + \gamma_1 \hat{QWS}_{ist} + f(Size_{ist}) \delta_2 + [f(Size_{ist}) \times I(Bed \geq 121)] \delta_3 + X_{it} \lambda + \eta_s + \tau_t + u_{ist}
where:
\hat{QWS}_{ist} = Instrumented proportion of licensed workers.
\gamma_1 = Causal effect of certified staffing on quality outcomes.
Key Observations
- The larger the discontinuity at 120 beds, the closer \gamma_1 \approx \beta_1.
- The ITT effect (\beta_1) will always be weaker than the local average treatment effect (\gamma_1).
27.12.4.6 Empirical Challenges
-
Bunching at Round Numbers:
- Figure 1 shows facilities clustering at every 5-bed increment.
- If manipulation occurs, we expect underrepresentation at 130 beds.
-
Clustering Standard Errors:
- Due to limited unique mass points (facilities at specific sizes), errors should be clustered by mass point instead of individual facilities.
-
Noncompliance and Selection Bias:
- Fuzzy RD requires that we do not drop non-compliers (as it would introduce selection bias).
- However, we can drop manipulators if there is clear evidence of behavioral bias (e.g., preference for round numbers).
27.12.5 Replicating (Carpenter and Dobkin 2009)
A well-known RD study by (Carpenter and Dobkin 2009) investigates the causal effect of legal drinking age on mortality rates.
Replication available at: Philipp Leppert.
Data available at: OpenICPSR.
27.12.6 Additional RD Applications
For a comprehensive empirical RD application, see (Thoemmes, Liao, and Jin 2017), where:
Multiple RD packages (
rdd
,rdrobust
,rddtools
) are tested.Robustness checks are performed across bandwidth selection and polynomial orders.