20.8 Putting It All Together: Comparing Objectives

As an overarching illustration, let \(\hat{f}\) be any trained predictor (ML model, regression, etc.) and let \(\hat{\beta}\) be a parameter estimator from a structural or causal model. Their respective tasks differ:

  • Form of Output
    • \(\hat{f}\) is a function from \(\mathcal{X} \to \mathcal{Y}\).
    • \(\hat{\beta}\) is a vector of parameters with theoretical meaning.
  • Criterion
    • Prediction: Minimizes predictive loss \(\mathbb{E}[L(Y,\hat{f}(X))]\).
    • Causal Inference: Seeks \(\beta\) such that \(Y = m_\beta(X)\) is a correct structural representation. Minimizes bias in \(\beta\), or satisfies orthogonality conditions in method-of-moments style, etc.
  • Validity
    • Prediction: Usually validated by out-of-sample experiments or cross-validation.
    • Estimation: Validated by theoretical identification arguments, assumptions about exogeneity, randomization, or no omitted confounders.
  • Interpretation
    • Prediction: “\(\hat{f}(x)\) is our best guess of \(Y\) for new \(x\).”
    • Causal Inference: “\(\beta\) measures how \(Y\) changes if we intervene on \(X\).”