20.4 Parameter Estimation and Causal Inference

20.4.1 Estimation in Parametric Models

In a simple parametric form:

Y=Xβ+ε,E[εX]=0,Var(εX)=σ2I.

The Ordinary Least Squares estimator is:

ˆβOLS=argmin

Under classical assumptions (e.g., no perfect collinearity, homoskedastic errors), \hat{\beta}_{\text{OLS}} is BLUE—the Best Linear Unbiased Estimator.

In a more general form, parameter estimation, denoted \hat{\beta}, focuses on estimating the relationship between y and x, often with a view toward causality. In many econometric or statistical settings, we write:

y = x^\top \beta + \varepsilon,

or more generally y = g\bigl(x;\beta\bigr) + \varepsilon, where \beta encodes the structural or causal parameters we wish to recover.

The core aim is consistency—that is, for large n, we want \hat{\beta} to converge to the true \beta that defines the underlying relationship. In other words:

\hat{\beta} \xrightarrow{p} \beta, \quad \text{as } n \to \infty.

Some texts phrase it informally as requiring that

\mathbb{E}\bigl[\hat{f}\bigr] = f,

meaning the estimator is (asymptotically) unbiased for the true function or parameters.

However, consistency alone may not suffice for scientific inference. One often also examines:

  • Asymptotic Normality: \sqrt{n}(\hat{\beta} - \beta) \;\;\xrightarrow{d}\;\; \mathcal{N}(0,\Sigma).
  • Confidence Intervals: \hat{\beta}_j \;\pm\; z_{\alpha/2}\,\mathrm{SE}\bigl(\hat{\beta}_j\bigr).
  • Hypothesis Tests: H_0\colon \beta_j = 0 \quad\text{vs.}\quad H_1\colon \beta_j \neq 0.

20.4.2 Causal Inference Fundamentals

To interpret \beta in Y = X\beta + \varepsilon as “causal,” we typically require that changes in X (or at least in one component of X) lead to changes in Y that are not confounded by omitted variables or simultaneity. In a prototypical potential-outcomes framework (for a binary treatment D):

  • Y_i(1): outcome if unit i receives treatment D = 1.
  • Y_i(0): outcome if unit i receives no treatment D = 0.

The observed outcome Y_i is

Y_i = D_i Y_i(1) + (1 - D_i) Y_i(0).

The Average Treatment Effect (ATE) is:

\tau = \mathbb{E}[Y(1) - Y(0)].

Identification of \tau requires an assumption like unconfoundedness:

\{Y(0), Y(1)\} \perp D \mid X,

i.e., after conditioning on X, the treatment assignment is as-if random. Estimation strategies then revolve around properly adjusting for X.

Such assumptions are not necessary for raw prediction of Y: a black-box function can yield \hat{Y} \approx Y without ensuring that \hat{Y}(1) - \hat{Y}(0) is an unbiased estimate of \tau.

20.4.3 Role of Identification

Identification means that the parameter of interest (\beta or \tau) is uniquely pinned down by the distribution of observables (under assumptions). If \beta is not identified (e.g., because of endogeneity or insufficient variation in X), no matter how large the sample, we cannot estimate \beta consistently.

In prediction, “identification” is not usually the main concern. The function \hat{f}(x) could be a complicated ensemble method that just fits well, without guaranteeing any structural or causal interpretation of its parameters.

20.4.4 Challenges

  1. High-Dimensional Spaces: With large p (number of predictors), covariance among variables (multicollinearity) can hamper classical estimation. This is the setting of the well-known bias-variance tradeoff (Hastie et al. 2009; Bishop and Nasrabadi 2006).
  2. Endogeneity: If x is correlated with the error term \varepsilon, ordinary least squares (OLS) is biased. Causal inference demands identifying exogenous variation in x, which requires additional assumptions or designs (e.g., randomization).
  3. Model Misspecification: If the functional form g\bigl(x;\beta\bigr) is incorrect, parameter estimates can systematically deviate from capturing the true underlying mechanism.

References

Bishop, Christopher M, and Nasser M Nasrabadi. 2006. Pattern Recognition and Machine Learning. Vol. 4. 4. Springer.
Hastie, Trevor, Robert Tibshirani, Jerome H Friedman, and Jerome H Friedman. 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Vol. 2. Springer.