5.7 Ridge Regression

When we have the Collinearity problem, we could use the Ridge regression.

The main problem with multicollinearity is that $\mathbf{X'X}$ is “ill-conditioned”. The idea for ridge regression: adding a constant to the diagonal of $\mathbf{X'X}$ improves the conditioning

$\mathbf{X'X} + c\mathbf{I} (c>0)$

The choice of c is hard. The estimator

$\mathbf{b}^R = (\mathbf{X'X}+c\mathbf{I})^{-1}\mathbf{X'y}$

is biased.

It has smaller variance than the OLS estimator; as c increases, the bias increases but the variance decreases.
Always exists some value of c for which the ridge regression estimator has a smaller total MSE than the OLS
The optimal c varies with application and data set.
To find the “optimal” $c$ we could use “ridge trace”.

We plot the values of the $p - 1$ parameter estimates for different values of c, simultaneously.

Typically, as c increases toward 1 the coefficients decreases to 0.
The values of the VIF tend to decrease rapidly as c gets bigger than 0. The VIF values begin to change slowly as $c \to 1$ .
Then we can examine the ridge trace and VIF values and chooses the smallest value of c where the regression coefficients first become stable in the ridge trace and the VIF values have become sufficiently small (which is very subjective).
Typically, this procedure is applied to the standardized regression model.