20.8 Putting It All Together: Comparing Objectives
As an overarching illustration, let \(\hat{f}\) be any trained predictor (ML model, regression, etc.) and let \(\hat{\beta}\) be a parameter estimator from a structural or causal model. Their respective tasks differ:
- Form of Output
- \(\hat{f}\) is a function from \(\mathcal{X} \to \mathcal{Y}\).
- \(\hat{\beta}\) is a vector of parameters with theoretical meaning.
- Criterion
- Prediction: Minimizes predictive loss \(\mathbb{E}[L(Y,\hat{f}(X))]\).
- Causal Inference: Seeks \(\beta\) such that \(Y = m_\beta(X)\) is a correct structural representation. Minimizes bias in \(\beta\), or satisfies orthogonality conditions in method-of-moments style, etc.
- Validity
- Prediction: Usually validated by out-of-sample experiments or cross-validation.
- Estimation: Validated by theoretical identification arguments, assumptions about exogeneity, randomization, or no omitted confounders.
- Interpretation
- Prediction: “\(\hat{f}(x)\) is our best guess of \(Y\) for new \(x\).”
- Causal Inference: “\(\beta\) measures how \(Y\) changes if we intervene on \(X\).”