38.1 Basic Notation and Graph Structures
Directed Acyclic Graphs are composed of basic building blocks that define relationships between variables.
- Mediators (Chains)
\[ X \to Z \to Y \]
- Variable \(Z\) mediates the effect of \(X\) on \(Y\).
- Controlling for \(Z\) blocks the indirect effect of \(X\) on \(Y\).
- Use case in marketing: Email promotion (\(X\)) → customer interest (\(Z\)) → purchase (\(Y\)). Controlling for interest removes the indirect path, isolating the direct impact.
- Common Causes (Forks)
\[ X \leftarrow Z \to Y \]
- \(Z\) is a confounder, creating a spurious association between \(X\) and \(Y\).
- To estimate the causal effect of \(X\) on \(Y\), \(Z\) must be controlled.
- Use case in finance: An economic indicator (\(Z\)) affects both stock investment decisions (\(X\)) and market returns (\(Y\)).
Key concept: If \(Z\) is not controlled, \(X\) and \(Y\) may appear correlated due to a shared cause rather than a causal link.
- Common Effects (Colliders)
\[ X \to Z \leftarrow Y \]
- \(Z\) is a collider, and controlling for it induces a spurious association between \(X\) and \(Y\).
- Do not control for \(Z\) or its descendants.
- Use case in HR analytics: Two independent hiring factors (\(X\) = education, \(Y\) = experience) both influence a decision variable \(Z\) (hiring outcome). Conditioning on being hired can create an artificial correlation between education and experience.
Other Concepts
- Descendants: Any variable downstream from a node; controlling for a descendant can have similar effects to controlling for the ancestor.
- d-Separation: A graphical criterion to determine conditional independence. If all paths between \(X\) and \(Y\) are blocked by controlling for a set of variables \(Z\), then \(X\) is d-separated from \(Y\) given \(Z\).