34.7 Negative R2 in IV

In IV estimation, particularly 2SLS and 3SLS, it is common and not problematic to encounter negative R2 values in the second stage regression. Unlike Ordinary Least Squares, where R2 is often used to assess the fit of the model, in IV regression the primary concern is consistency and unbiased estimation of the coefficients of interest, not the goodness-of-fit.

What Should You Look At Instead of R2 in IV?

  1. Instrument Relevance (First-stage F-statistics, Partial R2)
  2. Weak Instrument Tests (Kleibergen-Paap, Anderson-Rubin tests)
  3. Validity of Instruments (Overidentification tests like Sargan/Hansen J-test)
  4. Endogeneity Tests (Durbin-Wu-Hausman test for endogeneity)
  5. Confidence Intervals and Standard Errors, focusing on inference for ˆβ.

Geometric Intuition

  • In OLS, the fitted values ˆy are the orthogonal projection of y onto the column space of X.
  • In 2SLS, ˆy is the projection onto the space spanned by Z, not X.
  • As a result, the angle between y and ˆy may not minimize the residual variance, and RSS can be larger than in OLS.

Recall the formula for the coefficient of determination (R2) in a regression model:

R2=1RSSTSS=MSSTSS

Where:

  • TSS is the Total Sum of Squares: TSS=ni=1(yiˉy)2
  • MSS is the Model Sum of Squares: MSS=ni=1(ˆyiˉy)2
  • RSS is the Residual Sum of Squares: RSS=ni=1(yiˆyi)2

In OLS, the R2 measures the proportion of variance in Y that is explained by the regressors X.

Key Properties in OLS:

  • R2[0,1]
  • Adding more regressors (even irrelevant ones) never decreases R2.
  • R2 measures in-sample goodness-of-fit, not causal interpretation.

34.7.1 Why Does R2 Lose Its Meaning in IV Regression?

In IV regression, the second stage regression replaces the endogenous variable X2 with its predicted values from the first stage:

Stage 1:

X2=Zπ+v

Stage 2:

Y=X1β1+ˆX2β2+ϵ

  • ˆX2 is not the observed X2, but a proxy constructed from Z.
  • ˆX2 isolates the exogenous variation in X2 that is independent of ϵ.
  • This reduces bias, but comes at a cost:
    • The variation in ˆX2 is typically less than that in X2.
    • The predicted values ˆyi from the second stage are not necessarily close to yi.

34.7.2 Why R2 Can Be Negative:

  1. R2 is calculated using: R2=1RSSTSS But in IV:
  • The predicted values of Y are not chosen to minimize RSS, because IV is not minimizing the residuals in the second stage.
  • Unlike OLS, 2SLS chooses ˆβ to satisfy moment conditions rather than minimizing the sum of squared errors.
  1. It is possible (and common in IV) for the residual sum of squares to be greater than the total sum of squares: RSS>TSS Which makes: R2=1RSSTSS<0

  2. This happens because:

    • The predicted values ˆyi in IV are not optimized to fit the observed yi.
    • The residuals can be larger, because IV focuses on identifying causal effects, not prediction.

For example, assume we have:

  • TSS=100

  • RSS=120

Then: R2=1120100=0.20

This happens because the IV procedure does not minimize RSS. It prioritizes solving the endogeneity problem over explaining the variance in Y.


34.7.3 Why We Don’t Care About R2 in IV

  1. IV Estimates Focus on Consistency, Not Prediction
  • The goal of IV is to obtain a consistent estimate of β2.
  • IV sacrifices fit (higher variance in ˆyi) to remove endogeneity bias.
  1. R2 Does Not Reflect the Quality of an IV Estimator
  • A high R2 in IV may be misleading (for instance, when instruments are weak or invalid).
  • A negative R2 does not imply a bad IV estimator if the assumptions of instrument validity are met.
  1. IV Regression Is About Identification, Not In-Sample Fit
  • IV relies on relevance and exogeneity of instruments, not residual minimization.

34.7.4 Technical Details on R2

In OLS: ˆβOLS=(XX)1XY Minimizes: RSS=(YXˆβOLS)(YXˆβOLS)

In IV: ˆβIV=(XPZX)1XPZY

where:

  • PZ=Z(ZZ)1Z is the projection matrix onto Z.

  • The IV estimator solves: Z(YXˆβ)=0

  • No guarantee that this minimizes RSS.

Residuals:

eIV=YXˆβIV

The norm of eIV is typically larger than in OLS because IV uses fewer effective degrees of freedom (constrained variation via Z).

A Note on R2 in 3SLS and GMM

  • In 3SLS or GMM IV, R2 can be similarly misleading.
  • These methods often operate under moment conditions or system estimation, not residual minimization.