38.1 Basic Notation and Graph Structures

Directed Acyclic Graphs are composed of basic building blocks that define relationships between variables.

  1. Mediators (Chains)

\[ X \to Z \to Y \]

  • Variable \(Z\) mediates the effect of \(X\) on \(Y\).
  • Controlling for \(Z\) blocks the indirect effect of \(X\) on \(Y\).
  • Use case in marketing: Email promotion (\(X\)) → customer interest (\(Z\)) → purchase (\(Y\)). Controlling for interest removes the indirect path, isolating the direct impact.
  1. Common Causes (Forks)

\[ X \leftarrow Z \to Y \]

  • \(Z\) is a confounder, creating a spurious association between \(X\) and \(Y\).
  • To estimate the causal effect of \(X\) on \(Y\), \(Z\) must be controlled.
  • Use case in finance: An economic indicator (\(Z\)) affects both stock investment decisions (\(X\)) and market returns (\(Y\)).

Key concept: If \(Z\) is not controlled, \(X\) and \(Y\) may appear correlated due to a shared cause rather than a causal link.

  1. Common Effects (Colliders)

\[ X \to Z \leftarrow Y \]

  • \(Z\) is a collider, and controlling for it induces a spurious association between \(X\) and \(Y\).
  • Do not control for \(Z\) or its descendants.
  • Use case in HR analytics: Two independent hiring factors (\(X\) = education, \(Y\) = experience) both influence a decision variable \(Z\) (hiring outcome). Conditioning on being hired can create an artificial correlation between education and experience.

Other Concepts

  • Descendants: Any variable downstream from a node; controlling for a descendant can have similar effects to controlling for the ancestor.
  • d-Separation: A graphical criterion to determine conditional independence. If all paths between \(X\) and \(Y\) are blocked by controlling for a set of variables \(Z\), then \(X\) is d-separated from \(Y\) given \(Z\).