39.4 Choosing Controls

Identifying which variables to control for is one of the most important — and difficult — steps in causal inference. The goal is to block all back-door paths between the treatment X and the outcome Y, without introducing bias from colliders or mediators.

When done correctly, adjustment removes confounding bias. When done incorrectly, it can introduce bias, increase variance, or obscure the true causal relationship.


39.4.1 Step 1: Use a Causal Diagram (DAG)

Causal diagrams provide a graphical representation of assumptions about the data-generating process. With a DAG, we can:

  • Identify all back-door paths from X to Y
  • Determine which paths are blocked or opened by conditioning
  • Use software to identify minimal sufficient adjustment sets

For example, using dagitty:

library(dagitty)

dag <- dagitty("dag {
  X -> Y
  Z -> X
  Z -> Y
  U -> X
  U -> Y
}")

adjustmentSets(dag, exposure = "X", outcome = "Y")

This will return the set(s) of covariates that must be controlled for to estimate the causal effect of X on Y under the back-door criterion.

39.4.2 Step 2: Use Algorithmic Tools

Several tools can automate the process of selecting appropriate controls given a DAG:

DAGitty provides an intuitive browser-based interface to:

  • Draw causal diagrams

  • Identify minimal sufficient adjustment sets

  • Simulate interventions (do-calculus)

  • Diagnose overcontrol or collider bias

It supports direct integration with R and allows reproducible workflows.

Fusion is a powerful tool for:

  • Computing identification formulas using do-calculus

  • Handling complex longitudinal data and selection bias

  • Formalizing queries for total, direct, and mediated effects

Fusion implements algorithms that go beyond standard adjustment and allow for nonparametric identification when latent confounders are present.

39.4.3 Step 3: Theoretical Principles

Key guidelines include:

  • Do not control for mediators if estimating the total effect

  • Control for pre-treatment confounders (common causes of treatment and outcome)

  • Avoid colliders and their descendants

  • Consider the use of instrumental variables when no suitable adjustment set exists

39.4.4 Step 4: Consider Sensitivity Analysis

Even with well-reasoned DAGs, our control set may be imperfect, especially if some variables are unobserved or measured with error. In these cases, sensitivity analysis tools help quantify how robust our causal conclusions are.

The sensemakr package (Cinelli et al. 2019; Cinelli and Hazlett 2020) allows for:

  • Quantifying how strong unmeasured confounding would have to be to change conclusions

  • Reporting robustness values: the minimal strength of confounding needed to explain away the effect

  • Graphical summaries of confounding thresholds

This allows researchers to report assumptions transparently, even in the presence of unmeasured bias.

39.4.5 Step 5: Know When Not to Control

Remember, not every variable should be adjusted for. The table below summarizes when to control and when to avoid:

Variable Type Control? Why
Confounder (ZX and ZY) Yes Blocks back-door path
Mediator (XZY) No Blocks part of the effect (unless estimating direct effect)
Collider (XZY) No Opens non-causal paths
Instrument (ZX, Z) No Used differently, not for adjustment
Pre-treatment proxy for outcome Caution May amplify bias or introduce overcontrol
Predictor of outcome, not X Optional Improves precision, does not affect identification

39.4.6 Summary: Control Selection Pipeline

  1. Define your causal question clearly (total effect, direct effect, etc.)

  2. Draw a DAG that reflects substantive knowledge

  3. Use DAGitty/Fusion to identify minimal sufficient control sets

  4. Double-check for bad controls (colliders, mediators)

  5. If in doubt, conduct sensitivity analysis using sensemakr

  6. Report assumptions transparently — causal conclusions are only as valid as the assumptions they rely on

The most important confounders are often unmeasured. But recognizing which ones should have been measured is already half the battle.

References

Cinelli, Carlos, and Chad Hazlett. 2020. “Making Sense of Sensitivity: Extending Omitted Variable Bias.” Journal of the Royal Statistical Society Series B: Statistical Methodology 82 (1): 39–67.
Cinelli, Carlos, Daniel Kumor, Bryant Chen, Judea Pearl, and Elias Bareinboim. 2019. “Sensitivity Analysis of Linear Structural Causal Models.” In International Conference on Machine Learning, 1252–61. PMLR; Proceedings of Machine Learning Research.