35.4 Steps for Matching

Most matching methods rely on:

Propensity score: summarizes $P(T=1|X)$
Distance metric: measures similarity
Covariates: assumed to satisfy ignorability

35.4.1 Step 1: Define “Closeness” (Distance Metrics)

Matching requires a distance metric to define similarity between treated and control units.

Variable Selection Guidelines

Include as many pre-treatment covariates as possible to support conditional ignorability.
Avoid post-treatment variables, which introduce bias.
Be cautious with variables (e.g., heavy drug users) highly correlated with the outcome (e.g., heavy drinkers) but not treatment (e.g., mediators).
If variables are uncorrelated with both treatment and outcome, the cost of inclusion is small.

Distance Measures

Method	Formula	Notes
Exact Matching	$D_{ij} = 0$ if $X_i = X_j$ , else $\infty$	Only feasible in low dimensions; can be relaxed via [Coarsened Exact Matching]
Mahalanobis Distance	$D_{ij} = (X_i - X_j)' \Sigma^{-1}(X_i - X_j)$	$\Sigma$ (var-covar matrix of $X$ ) from control group if ATT is of interest or pooled if ATE is of interest; sensitive to dimensionality
Propensity Score	$D_{ij} = \|e_i - e_j\|$	Where $e_k$ is the estimated propensity score $P(T=1 \mid X_k)$ for unit $k$ . Advanced: Prognostic scores (B. B. Hansen 2008) require modeling $E[Y(0)\|X]$ , so they depend on the outcome model.
Logit Propensity Score	$D_{ij} = \|\text{logit}(e_i) - \text{logit}(e_j)\|$	More stable in tails of distribution

Tip: In high dimensions, exact and Mahalanobis matching perform poorly. Combining Mahalanobis with propensity score calipers can improve robustness (Rubin and Thomas 2000).

Advanced methods for longitudinal setting:

Marginal Structural Models: for time-varying treatments (Robins, Hernan, and Brumback 2000)
Balanced Risk Set Matching: for survival analysis (Y. P. Li, Propert, and Rosenbaum 2001)

35.4.2 Step 2: Matching Algorithms

Nearest Neighbor Matching

Greedy matching: Fast, but suboptimal under competition for controls.
Optimal matching: Minimizes global distance across all pairs.
Ratio matching (k:1): Useful when controls outnumber treated; choose $k$ using trade-off between bias and variance (Rubin and Thomas 1996).
With vs. without replacement:
- With replacement: Improves matching quality, but requires frequency weights for analysis.
- Without replacement: Simpler, but less flexible.

Subclassification, Full Matching, and Weighting

These methods generalize nearest-neighbor approaches by assigning fractional weights.

Subclassification: Partition into strata based on propensity score (e.g., quintiles).
Full Matching: Each treated unit is matched to a weighted group of controls (and vice versa) to minimize average within-set distance.
Weighting: Weighting techniques use propensity scores to estimate the ATE. However, if the weights are extreme, the resulting variance may be inflated—not due to the underlying probabilities, but due to the estimation procedure itself. To address this issue, researchers can employ (1) weight trimming or (2) doubly robust methods when using propensity scores for weighting or matching.
- Inverse Probability of Treatment Weighting (IPTW): $w_i = \frac{T_i}{\hat{e}_i} + \frac{1 - T_i}{1 - \hat{e}_i}$
- Odds weighting: $w_i = T_i + (1 - T_i)\frac{\hat{e}_i}{1 - \hat{e}_i}$
- Kernel weighting: Smooth average over control group (popular in economics).
- Trimming and Doubly-Robust Methods: Reduce variance due to extreme weights.

Assessing Common Support

Use propensity score histograms to visualize overlap.
Units outside the convex hull of $X$ (i.e., unmatched regions) can be discarded.
Lack of overlap indicates that some comparisons are extrapolations, not empirical matches.

35.4.3 Step 3: Diagnosing Match Quality

Balance Diagnostics

Matching aims to balance the covariate distributions between treated and control units. A well-matched sample satisfies:

$\tilde{p}(X \mid T=1) \approx \tilde{p}(X \mid T=0)$

where $\tilde{p}$ is the empirical distribution.

Numerical Checks

Standardized differences in means (most common): Should be $< 0.1$
Standardized difference of propensity scores: Should be $< 0.25$ (Rubin 2001)
Variance ratio of propensity scores: Between 0.5 and 2.0 (Rubin 2001)
Variance of residuals after regression on propensity score (treated vs. control) for each covariate

Avoid using p-values as diagnostics—they conflate balance with statistical power and are sensitive to sample size.

Graphical Diagnostics

Empirical Distribution Plots
Quantile-Quantile (QQ) Plots
Love Plots: Summarize standardized differences before/after matching

35.4.4 Step 4: Estimating Treatment Effects

After Matching

With k:1 matching with replacement, use weights to adjust for reuse of controls.
Use regression adjustment on matched samples to improve precision and adjust for residual imbalance.

After Subclassification or Full Matching

ATT: Weight subclass-specific estimates by number of treated units.
ATE: Weight by total units per subclass.

Variance Estimation

Must reflect uncertainty in both:

The matching procedure (sampling and distance calculation) (Step 3)
The outcome model (regression, difference-in-means, etc.) (Step 4)

Often estimated via bootstrapping.

References

Hansen, Ben B. 2008. “The Prognostic Analogue of the Propensity Score.” Biometrika 95 (2): 481–88.

Li, Yunfei Paul, Kathleen J Propert, and Paul R Rosenbaum. 2001. “Balanced Risk Set Matching.” Journal of the American Statistical Association 96 (455): 870–82.

Robins, James M, Miguel Angel Hernan, and Babette Brumback. 2000. “Marginal Structural Models and Causal Inference in Epidemiology.” Epidemiology, 550–60.

———. 2001. “Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco Litigation.” Health Services and Outcomes Research Methodology 2: 169–88.

Rubin, Donald B, and Neal Thomas. 1996. “Matching Using Estimated Propensity Scores: Relating Theory to Practice.” Biometrics, 249–64.

———. 2000. “Combining Propensity Score Matching with Additional Adjustments for Prognostic Covariates.” Journal of the American Statistical Association 95 (450): 573–85.