5.7 Ridge Regression
When we have the Collinearity problem, we could use the Ridge regression.
The main problem with multicollinearity is that X′X is “ill-conditioned”. The idea for ridge regression: adding a constant to the diagonal of X′X improves the conditioning
X′X+cI(c>0)
The choice of c is hard. The estimator
bR=(X′X+cI)−1X′y
is biased.
- It has smaller variance than the OLS estimator; as c increases, the bias increases but the variance decreases.
- Always exists some value of c for which the ridge regression estimator has a smaller total MSE than the OLS
- The optimal c varies with application and data set.
- To find the “optimal” c we could use “ridge trace”.
We plot the values of the p−1 parameter estimates for different values of c, simultaneously.
- Typically, as c increases toward 1 the coefficients decreases to 0.
- The values of the VIF tend to decrease rapidly as c gets bigger than 0. The VIF values begin to change slowly as c→1.
- Then we can examine the ridge trace and VIF values and chooses the smallest value of c where the regression coefficients first become stable in the ridge trace and the VIF values have become sufficiently small (which is very subjective).
- Typically, this procedure is applied to the standardized regression model.