35.4 Steps for Matching
Most matching methods rely on:
- Propensity score: summarizes P(T=1|X)
- Distance metric: measures similarity
- Covariates: assumed to satisfy ignorability
35.4.1 Step 1: Define “Closeness” (Distance Metrics)
Matching requires a distance metric to define similarity between treated and control units.
Variable Selection Guidelines
- Include as many pre-treatment covariates as possible to support conditional ignorability.
- Avoid post-treatment variables, which introduce bias.
- Be cautious with variables (e.g., heavy drug users) highly correlated with the outcome (e.g., heavy drinkers) but not treatment (e.g., mediators).
- If variables are uncorrelated with both treatment and outcome, the cost of inclusion is small.
Distance Measures
Method | Formula | Notes |
---|---|---|
Exact Matching | Dij=0 if Xi=Xj, else ∞ | Only feasible in low dimensions; can be relaxed via [Coarsened Exact Matching] |
Mahalanobis Distance | Dij=(Xi−Xj)′Σ−1(Xi−Xj) | Σ (var-covar matrix of X) from control group if ATT is of interest or pooled if ATE is of interest; sensitive to dimensionality |
Propensity Score | Dij=|ei−ej| | Where ek is the estimated propensity score P(T=1∣Xk) for unit k. Advanced: Prognostic scores (B. B. Hansen 2008) require modeling E[Y(0)|X], so they depend on the outcome model. |
Logit Propensity Score | Dij=|logit(ei)−logit(ej)| | More stable in tails of distribution |
Tip: In high dimensions, exact and Mahalanobis matching perform poorly. Combining Mahalanobis with propensity score calipers can improve robustness (Rubin and Thomas 2000).
Advanced methods for longitudinal setting:
- Marginal Structural Models: for time-varying treatments (Robins, Hernan, and Brumback 2000)
- Balanced Risk Set Matching: for survival analysis (Y. P. Li, Propert, and Rosenbaum 2001)
35.4.2 Step 2: Matching Algorithms
- Nearest Neighbor Matching
- Greedy matching: Fast, but suboptimal under competition for controls.
- Optimal matching: Minimizes global distance across all pairs.
- Ratio matching (k:1): Useful when controls outnumber treated; choose k using trade-off between bias and variance (Rubin and Thomas 1996).
- With vs. without replacement:
- With replacement: Improves matching quality, but requires frequency weights for analysis.
- Without replacement: Simpler, but less flexible.
- Subclassification, Full Matching, and Weighting
These methods generalize nearest-neighbor approaches by assigning fractional weights.
- Subclassification: Partition into strata based on propensity score (e.g., quintiles).
- Full Matching: Each treated unit is matched to a weighted group of controls (and vice versa) to minimize average within-set distance.
- Weighting: Weighting techniques use propensity scores to estimate the ATE. However, if the weights are extreme, the resulting variance may be inflated—not due to the underlying probabilities, but due to the estimation procedure itself. To address this issue, researchers can employ (1) weight trimming or (2) doubly robust methods when using propensity scores for weighting or matching.
- Inverse Probability of Treatment Weighting (IPTW): wi=Tiˆei+1−Ti1−ˆei
- Odds weighting: wi=Ti+(1−Ti)ˆei1−ˆei
- Kernel weighting: Smooth average over control group (popular in economics).
- Trimming and Doubly-Robust Methods: Reduce variance due to extreme weights.
- Assessing Common Support
- Use propensity score histograms to visualize overlap.
- Units outside the convex hull of X (i.e., unmatched regions) can be discarded.
- Lack of overlap indicates that some comparisons are extrapolations, not empirical matches.
35.4.3 Step 3: Diagnosing Match Quality
Balance Diagnostics
Matching aims to balance the covariate distributions between treated and control units. A well-matched sample satisfies:
˜p(X∣T=1)≈˜p(X∣T=0)
where ˜p is the empirical distribution.
- Numerical Checks
- Standardized differences in means (most common): Should be <0.1
- Standardized difference of propensity scores: Should be <0.25 (Rubin 2001)
- Variance ratio of propensity scores: Between 0.5 and 2.0 (Rubin 2001)
- Variance of residuals after regression on propensity score (treated vs. control) for each covariate
Avoid using p-values as diagnostics—they conflate balance with statistical power and are sensitive to sample size.
- Graphical Diagnostics
- Empirical Distribution Plots
- Quantile-Quantile (QQ) Plots
- Love Plots: Summarize standardized differences before/after matching
35.4.4 Step 4: Estimating Treatment Effects
- After Matching
- With k:1 matching with replacement, use weights to adjust for reuse of controls.
- Use regression adjustment on matched samples to improve precision and adjust for residual imbalance.
- After Subclassification or Full Matching
- ATT: Weight subclass-specific estimates by number of treated units.
- ATE: Weight by total units per subclass.
- Variance Estimation
Must reflect uncertainty in both:
- The matching procedure (sampling and distance calculation) (Step 3)
- The outcome model (regression, difference-in-means, etc.) (Step 4)
Often estimated via bootstrapping.