5.7 Ridge Regression

When we have the Collinearity problem, we could use the Ridge regression.

The main problem with multicollinearity is that \(\mathbf{X'X}\) is “ill-conditioned”. The idea for ridge regression: adding a constant to the diagonal of \(\mathbf{X'X}\) improves the conditioning

\[ \mathbf{X'X} + c\mathbf{I} (c>0) \]

The choice of c is hard. The estimator

\[ \mathbf{b}^R = (\mathbf{X'X}+c\mathbf{I})^{-1}\mathbf{X'y} \]

is biased.

  • It has smaller variance than the OLS estimator; as c increases, the bias increases but the variance decreases.
  • Always exists some value of c for which the ridge regression estimator has a smaller total MSE than the OLS
  • The optimal c varies with application and data set.
  • To find the “optimal” \(c\) we could use “ridge trace”.

We plot the values of the \(p - 1\) parameter estimates for different values of c, simultaneously.

  • Typically, as c increases toward 1 the coefficients decreases to 0.
  • The values of the VIF tend to decrease rapidly as c gets bigger than 0. The VIF values begin to change slowly as \(c \to 1\).
  • Then we can examine the ridge trace and VIF values and chooses the smallest value of c where the regression coefficients first become stable in the ridge trace and the VIF values have become sufficiently small (which is very subjective).
  • Typically, this procedure is applied to the standardized regression model.