35.7 Matching vs. Regression

Matching and regression are two core strategies used in observational studies to adjust for differences in covariates $X$ and estimate causal effects. While both aim to remove bias due to confounding, they approach the problem differently, particularly in how they weight observations, handle functional form assumptions, and address covariate balance.

Neither method can resolve the issue of unobserved confounding, but each can be a powerful tool when used with care and supported by appropriate diagnostics.

Matching emphasizes covariate balance by pruning the dataset to retain only comparable units. It is nonparametric, focusing on ATT.
Regression (typically OLS) emphasizes functional form and allows for model-based adjustment, enabling the estimation of ATE and continuous or interactive effects of treatment.

Both matching and regression assign implicit or explicit weights to observations during estimation:

Matching: Weights observations more heavily in strata with more treated units, aligning with the ATT estimand.
OLS Regression: Places more weight on strata where the variance of treatment assignment is highest—i.e., when groups are approximately balanced between treated and control (near 50/50).

This results in differing estimands and sensitivities:

Important Caveat: If your OLS estimate is biased due to unobserved confounding, your matching estimate is likely biased too. Both depend on the selection on observables assumption.

We explore the difference in estimands between matching and regression, especially for estimating the ATT.

35.7.1 Matching Estimand

Suppose we want the treatment effect on the treated:

$\delta_{\text{TOT}} = E[Y_{1i} - Y_{0i} \mid D_i = 1]$

Using the Law of Iterated Expectation:

$\delta_{\text{TOT}} = E\left[ E[Y_{1i} \mid X_i, D_i = 1] - E[Y_{0i} \mid X_i, D_i = 1] \mid D_i = 1 \right]$

Assuming conditional independence:

$E[Y_{0i} \mid X_i, D_i = 0] = E[Y_{0i} \mid X_i, D_i = 1]$

Then,

$\begin{aligned} \delta_{TOT} &= E [ E[ Y_{1i} | X_i, D_i = 1] - E[ Y_{0i}|X_i, D_i = 0 ]|D_i = 1 ] \\ &= E\left[ E[Y_i \mid X_i, D_i = 1] - E[Y_i \mid X_i, D_i = 0] \mid D_i = 1 \right] \\ &= E[\delta_X |D_i = 1] \end{aligned}$

where $\delta_X$ is an $X$ -specific difference in means at covariate value $X_i$

If $X_i$ is discrete, the matching estimand becomes:

$\delta_M = \sum_x \delta_x P(X_i = x \mid D_i = 1)$

where $P(X_i = x |D_i = 1)$ is the probability mass function for $X_i$ given $D_i = 1$

By Bayes’ rule:

$P(X_i = x \mid D_i = 1) = \frac{P(D_i = 1 \mid X_i = x) P(X_i = x)}{P(D_i = 1)}$

So,

$\begin{aligned} \delta_M &= \frac{\sum_x \delta_x P (D_i = 1 | X_i = x) P (X_i = x)}{\sum_x P(D_i = 1 |X_i = x)P(X_i = x)} \\ &= \sum_x \delta_x \frac{ P (D_i = 1 | X_i = x) P (X_i = x)}{\sum_x P(D_i = 1 |X_i = x)P(X_i = x)} \end{aligned}$

35.7.2 Regression Estimand

In regression:

$Y_i = \sum_x d_{ix} \beta_x + \delta_R D_i + \varepsilon_i$

$d_{ix}$ = indicator that $X_i = x$
$\beta_x$ = baseline outcome at $X = x$
$\delta_R$ = regression estimand

Then,

$\begin{aligned} \delta_R &= \frac{\sum_x \delta_x [P(D_i = 1 | X_i = x) (1 - P(D_i = 1 | X_i = x))]P(X_i = x)}{\sum_x [P(D_i = 1| X_i = x)(1 - P(D_i = 1 | X_i = x))]P(X_i = x)} \\ &= \sum_x \delta_x \frac{[P(D_i = 1 | X_i = x) (1 - P(D_i = 1 | X_i = x))]P(X_i = x)}{\sum_x [P(D_i = 1| X_i = x)(1 - P(D_i = 1 | X_i = x))]P(X_i = x)} \end{aligned}$

35.7.3 Interpretation: Weighting Differences

The distinction between matching and regression comes down to how covariate-specific treatment effects $\delta_x$ are weighted:

Type	Weighting Function	Interpretation	Makes Sense Because…
Matching	$P(D_i = 1 \mid X_i = x)$	Weights more heavily where more treated units exist (ATT-focused)	We’re interested in the effect on the treated, so more weight is placed where treated units are observed
Regression	$\begin{aligned}P(D_i = 1 \mid X_i = x)\\(1 - P(D_i = 1 \mid X_i = x))\end{aligned}$	Weights more where treatment assignment has high variance (i.e., near 50/50 treated/control)	These cells provide lowest-variance estimates of $\delta_x$ , assuming the treatment effect is homogenous across $X$

Summary Table: Matching vs. Regression

Feature	Matching	Regression (OLS)
Functional Form	Less parametric; no assumption of linearity	Parametric; usually assumes linearity
Primary Estimand	ATT (effect on the treated)	ATE or effects of continuous/interacted treatments
Balance	Enforces balance via matched samples	Does not guarantee balance
Diagnostics	Covariate SMDs, QQ plots, empirical distributions	Residual plots, R-squared, heteroskedasticity tests
Unobserved Confounding	Cannot be resolved; assumes ignorability	Same limitation
Standard Errors	Larger; require bootstrapping	Smaller; closed-form under assumptions
Best Used When	High control-to-treated ratio; misspecification concerns	Model is correctly specified; sufficient overlap

Qualitative Comparisons

Matching	Regression
Not sensitive to the form of covariate-outcome relationship	Can estimate continuous or interacted treatment effects
Easier to assess balance and interpret diagnostics	Easier to estimate the effects of all covariates, not just treatment
Facilitates clear visual evaluation of overlap and balance	Less intuitive diagnostics; model diagnostics used
Helps when treatment is rare (prunes clearly incomparable controls)	Performs better with balanced treatment assignment
Forces explicit enforcement of common support	May extrapolate outside the support of covariate distributions