35.7 Matching vs. Regression

Matching and regression are two core strategies used in observational studies to adjust for differences in covariates X and estimate causal effects. While both aim to remove bias due to confounding, they approach the problem differently, particularly in how they weight observations, handle functional form assumptions, and address covariate balance.

Neither method can resolve the issue of unobserved confounding, but each can be a powerful tool when used with care and supported by appropriate diagnostics.

  • Matching emphasizes covariate balance by pruning the dataset to retain only comparable units. It is nonparametric, focusing on ATT.
  • Regression (typically OLS) emphasizes functional form and allows for model-based adjustment, enabling the estimation of ATE and continuous or interactive effects of treatment.

Both matching and regression assign implicit or explicit weights to observations during estimation:

  • Matching: Weights observations more heavily in strata with more treated units, aligning with the ATT estimand.
  • OLS Regression: Places more weight on strata where the variance of treatment assignment is highest—i.e., when groups are approximately balanced between treated and control (near 50/50).

This results in differing estimands and sensitivities:

Important Caveat: If your OLS estimate is biased due to unobserved confounding, your matching estimate is likely biased too. Both depend on the selection on observables assumption.

We explore the difference in estimands between matching and regression, especially for estimating the ATT.

35.7.1 Matching Estimand

Suppose we want the treatment effect on the treated:

δTOT=E[Y1iY0iDi=1]

Using the Law of Iterated Expectation:

δTOT=E[E[Y1iXi,Di=1]E[Y0iXi,Di=1]Di=1]

Assuming conditional independence:

E[Y0iXi,Di=0]=E[Y0iXi,Di=1]

Then,

δTOT=E[E[Y1i|Xi,Di=1]E[Y0i|Xi,Di=0]|Di=1]=E[E[YiXi,Di=1]E[YiXi,Di=0]Di=1]=E[δX|Di=1]

where δX is an X-specific difference in means at covariate value Xi

If Xi is discrete, the matching estimand becomes:

δM=xδxP(Xi=xDi=1)

where P(Xi=x|Di=1) is the probability mass function for Xi given Di=1

By Bayes’ rule:

P(Xi=xDi=1)=P(Di=1Xi=x)P(Xi=x)P(Di=1)

So,

δM=xδxP(Di=1|Xi=x)P(Xi=x)xP(Di=1|Xi=x)P(Xi=x)=xδxP(Di=1|Xi=x)P(Xi=x)xP(Di=1|Xi=x)P(Xi=x)


35.7.2 Regression Estimand

In regression:

Yi=xdixβx+δRDi+εi

  • dix = indicator that Xi=x
  • βx = baseline outcome at X=x
  • δR = regression estimand

Then,

δR=xδx[P(Di=1|Xi=x)(1P(Di=1|Xi=x))]P(Xi=x)x[P(Di=1|Xi=x)(1P(Di=1|Xi=x))]P(Xi=x)=xδx[P(Di=1|Xi=x)(1P(Di=1|Xi=x))]P(Xi=x)x[P(Di=1|Xi=x)(1P(Di=1|Xi=x))]P(Xi=x)


35.7.3 Interpretation: Weighting Differences

The distinction between matching and regression comes down to how covariate-specific treatment effects δx are weighted:

Type Weighting Function Interpretation Makes Sense Because…
Matching P(Di=1Xi=x) Weights more heavily where more treated units exist (ATT-focused) We’re interested in the effect on the treated, so more weight is placed where treated units are observed
Regression P(Di=1Xi=x)(1P(Di=1Xi=x)) Weights more where treatment assignment has high variance (i.e., near 50/50 treated/control) These cells provide lowest-variance estimates of δx, assuming the treatment effect is homogenous across X

Summary Table: Matching vs. Regression

Feature Matching Regression (OLS)
Functional Form Less parametric; no assumption of linearity Parametric; usually assumes linearity
Primary Estimand ATT (effect on the treated) ATE or effects of continuous/interacted treatments
Balance Enforces balance via matched samples Does not guarantee balance
Diagnostics Covariate SMDs, QQ plots, empirical distributions Residual plots, R-squared, heteroskedasticity tests
Unobserved Confounding Cannot be resolved; assumes ignorability Same limitation
Standard Errors Larger; require bootstrapping Smaller; closed-form under assumptions
Best Used When High control-to-treated ratio; misspecification concerns Model is correctly specified; sufficient overlap

Qualitative Comparisons

Matching Regression
Not sensitive to the form of covariate-outcome relationship Can estimate continuous or interacted treatment effects
Easier to assess balance and interpret diagnostics Easier to estimate the effects of all covariates, not just treatment
Facilitates clear visual evaluation of overlap and balance Less intuitive diagnostics; model diagnostics used
Helps when treatment is rare (prunes clearly incomparable controls) Performs better with balanced treatment assignment
Forces explicit enforcement of common support May extrapolate outside the support of covariate distributions