38.1 Basic Notation and Graph Structures
Directed Acyclic Graphs are composed of basic building blocks that define relationships between variables.
- Mediators (Chains)
X→Z→Y
- Variable Z mediates the effect of X on Y.
- Controlling for Z blocks the indirect effect of X on Y.
- Use case in marketing: Email promotion (X) → customer interest (Z) → purchase (Y). Controlling for interest removes the indirect path, isolating the direct impact.
- Common Causes (Forks)
X←Z→Y
- Z is a confounder, creating a spurious association between X and Y.
- To estimate the causal effect of X on Y, Z must be controlled.
- Use case in finance: An economic indicator (Z) affects both stock investment decisions (X) and market returns (Y).
Key concept: If Z is not controlled, X and Y may appear correlated due to a shared cause rather than a causal link.
- Common Effects (Colliders)
X→Z←Y
- Z is a collider, and controlling for it induces a spurious association between X and Y.
- Do not control for Z or its descendants.
- Use case in HR analytics: Two independent hiring factors (X = education, Y = experience) both influence a decision variable Z (hiring outcome). Conditioning on being hired can create an artificial correlation between education and experience.
Other Concepts
- Descendants: Any variable downstream from a node; controlling for a descendant can have similar effects to controlling for the ancestor.
- d-Separation: A graphical criterion to determine conditional independence. If all paths between X and Y are blocked by controlling for a set of variables Z, then X is d-separated from Y given Z.