39.4 Choosing Controls
Identifying which variables to control for is one of the most important — and difficult — steps in causal inference. The goal is to block all back-door paths between the treatment X and the outcome Y, without introducing bias from colliders or mediators.
When done correctly, adjustment removes confounding bias. When done incorrectly, it can introduce bias, increase variance, or obscure the true causal relationship.
39.4.1 Step 1: Use a Causal Diagram (DAG)
Causal diagrams provide a graphical representation of assumptions about the data-generating process. With a DAG, we can:
- Identify all back-door paths from X to Y
- Determine which paths are blocked or opened by conditioning
- Use software to identify minimal sufficient adjustment sets
For example, using dagitty
:
library(dagitty)
dag <- dagitty("dag {
X -> Y
Z -> X
Z -> Y
U -> X
U -> Y
}")
adjustmentSets(dag, exposure = "X", outcome = "Y")
This will return the set(s) of covariates that must be controlled for to estimate the causal effect of X on Y under the back-door criterion.
39.4.2 Step 2: Use Algorithmic Tools
Several tools can automate the process of selecting appropriate controls given a DAG:
DAGitty provides an intuitive browser-based interface to:
Draw causal diagrams
Identify minimal sufficient adjustment sets
Simulate interventions (do-calculus)
Diagnose overcontrol or collider bias
It supports direct integration with R
and allows reproducible workflows.
Fusion is a powerful tool for:
Computing identification formulas using do-calculus
Handling complex longitudinal data and selection bias
Formalizing queries for total, direct, and mediated effects
Fusion implements algorithms that go beyond standard adjustment and allow for nonparametric identification when latent confounders are present.
39.4.3 Step 3: Theoretical Principles
Key guidelines include:
Do not control for mediators if estimating the total effect
Control for pre-treatment confounders (common causes of treatment and outcome)
Avoid colliders and their descendants
Consider the use of instrumental variables when no suitable adjustment set exists
39.4.4 Step 4: Consider Sensitivity Analysis
Even with well-reasoned DAGs, our control set may be imperfect, especially if some variables are unobserved or measured with error. In these cases, sensitivity analysis tools help quantify how robust our causal conclusions are.
The sensemakr
package (Cinelli et al. 2019; Cinelli and Hazlett 2020) allows for:
Quantifying how strong unmeasured confounding would have to be to change conclusions
Reporting robustness values: the minimal strength of confounding needed to explain away the effect
Graphical summaries of confounding thresholds
This allows researchers to report assumptions transparently, even in the presence of unmeasured bias.
39.4.5 Step 5: Know When Not to Control
Remember, not every variable should be adjusted for. The table below summarizes when to control and when to avoid:
Variable Type | Control? | Why |
---|---|---|
Confounder (Z→X and Z→Y) | Yes | Blocks back-door path |
Mediator (X→Z→Y) | No | Blocks part of the effect (unless estimating direct effect) |
Collider (X→Z←Y) | No | Opens non-causal paths |
Instrument (Z→X, Z↛) | No | Used differently, not for adjustment |
Pre-treatment proxy for outcome | Caution | May amplify bias or introduce overcontrol |
Predictor of outcome, not X | Optional | Improves precision, does not affect identification |
39.4.6 Summary: Control Selection Pipeline
Define your causal question clearly (total effect, direct effect, etc.)
Draw a DAG that reflects substantive knowledge
Use DAGitty/Fusion to identify minimal sufficient control sets
Double-check for bad controls (colliders, mediators)
If in doubt, conduct sensitivity analysis using
sensemakr
Report assumptions transparently — causal conclusions are only as valid as the assumptions they rely on
The most important confounders are often unmeasured. But recognizing which ones should have been measured is already half the battle.