34.7 Negative \(R^2\) in IV

In IV estimation, particularly 2SLS and 3SLS, it is common and not problematic to encounter negative \(R^2\) values in the second stage regression. Unlike Ordinary Least Squares, where \(R^2\) is often used to assess the fit of the model, in IV regression the primary concern is consistency and unbiased estimation of the coefficients of interest, not the goodness-of-fit.

What Should You Look At Instead of \(R^2\) in IV?

  1. Instrument Relevance (First-stage \(F\)-statistics, Partial \(R^2\))
  2. Weak Instrument Tests (Kleibergen-Paap, Anderson-Rubin tests)
  3. Validity of Instruments (Overidentification tests like Sargan/Hansen J-test)
  4. Endogeneity Tests (Durbin-Wu-Hausman test for endogeneity)
  5. Confidence Intervals and Standard Errors, focusing on inference for \(\hat{\beta}\).

Geometric Intuition

  • In OLS, the fitted values \(\hat{y}\) are the orthogonal projection of \(y\) onto the column space of \(X\).
  • In 2SLS, \(\hat{y}\) is the projection onto the space spanned by \(Z\), not \(X\).
  • As a result, the angle between \(y\) and \(\hat{y}\) may not minimize the residual variance, and RSS can be larger than in OLS.

Recall the formula for the coefficient of determination (\(R^2\)) in a regression model:

\[ R^2 = 1 - \frac{RSS}{TSS} = \frac{MSS}{TSS} \]

Where:

  • \(TSS\) is the Total Sum of Squares: \[ TSS = \sum_{i=1}^n (y_i - \bar{y})^2 \]
  • \(MSS\) is the Model Sum of Squares: \[ MSS = \sum_{i=1}^n (\hat{y}_i - \bar{y})^2 \]
  • \(RSS\) is the Residual Sum of Squares: \[ RSS = \sum_{i=1}^n (y_i - \hat{y}_i)^2 \]

In OLS, the \(R^2\) measures the proportion of variance in \(Y\) that is explained by the regressors \(X\).

Key Properties in OLS:

  • \(R^2 \in [0, 1]\)
  • Adding more regressors (even irrelevant ones) never decreases \(R^2\).
  • \(R^2\) measures in-sample goodness-of-fit, not causal interpretation.

34.7.1 Why Does \(R^2\) Lose Its Meaning in IV Regression?

In IV regression, the second stage regression replaces the endogenous variable \(X_2\) with its predicted values from the first stage:

Stage 1:

\[ X_2 = Z \pi + v \]

Stage 2:

\[ Y = X_1 \beta_1 + \hat{X}_2 \beta_2 + \epsilon \]

  • \(\hat{X}_2\) is not the observed \(X_2\), but a proxy constructed from \(Z\).
  • \(\hat{X}_2\) isolates the exogenous variation in \(X_2\) that is independent of \(\epsilon\).
  • This reduces bias, but comes at a cost:
    • The variation in \(\hat{X}_2\) is typically less than that in \(X_2\).
    • The predicted values \(\hat{y}_i\) from the second stage are not necessarily close to \(y_i\).

34.7.2 Why \(R^2\) Can Be Negative:

  1. \(R^2\) is calculated using: \[ R^2 = 1 - \frac{RSS}{TSS} \] But in IV:
  • The predicted values of \(Y\) are not chosen to minimize RSS, because IV is not minimizing the residuals in the second stage.
  • Unlike OLS, 2SLS chooses \(\hat{\beta}\) to satisfy moment conditions rather than minimizing the sum of squared errors.
  1. It is possible (and common in IV) for the residual sum of squares to be greater than the total sum of squares: \[ RSS > TSS \] Which makes: \[ R^2 = 1 - \frac{RSS}{TSS} < 0 \]

  2. This happens because:

    • The predicted values \(\hat{y}_i\) in IV are not optimized to fit the observed \(y_i\).
    • The residuals can be larger, because IV focuses on identifying causal effects, not prediction.

For example, assume we have:

  • \(TSS = 100\)

  • \(RSS = 120\)

Then: \[ R^2 = 1 - \frac{120}{100} = -0.20 \]

This happens because the IV procedure does not minimize RSS. It prioritizes solving the endogeneity problem over explaining the variance in \(Y\).


34.7.3 Why We Don’t Care About \(R^2\) in IV

  1. IV Estimates Focus on Consistency, Not Prediction
  • The goal of IV is to obtain a consistent estimate of \(\beta_2\).
  • IV sacrifices fit (higher variance in \(\hat{y}_i\)) to remove endogeneity bias.
  1. \(R^2\) Does Not Reflect the Quality of an IV Estimator
  • A high \(R^2\) in IV may be misleading (for instance, when instruments are weak or invalid).
  • A negative \(R^2\) does not imply a bad IV estimator if the assumptions of instrument validity are met.
  1. IV Regression Is About Identification, Not In-Sample Fit
  • IV relies on relevance and exogeneity of instruments, not residual minimization.

34.7.4 Technical Details on \(R^2\)

In OLS: \[ \hat{\beta}^{OLS} = (X'X)^{-1} X'Y \] Minimizes: \[ RSS = (Y - X \hat{\beta}^{OLS})'(Y - X \hat{\beta}^{OLS}) \]

In IV: \[ \hat{\beta}^{IV} = (X'P_Z X)^{-1} X'P_Z Y \]

where:

  • \(P_Z = Z (Z'Z)^{-1} Z'\) is the projection matrix onto \(Z\).

  • The IV estimator solves: \[ Z'(Y - X\hat{\beta}) = 0 \]

  • No guarantee that this minimizes RSS.

Residuals:

\[ e^{IV} = Y - X \hat{\beta}^{IV} \]

The norm of \(e^{IV}\) is typically larger than in OLS because IV uses fewer effective degrees of freedom (constrained variation via \(Z\)).

A Note on \(R^2\) in 3SLS and GMM

  • In 3SLS or GMM IV, \(R^2\) can be similarly misleading.
  • These methods often operate under moment conditions or system estimation, not residual minimization.