## 7.1 Ridge

Ridge regression estimates the linear model coefficients by minimizing an augmented loss function which includes a term, $$\lambda$$, that penalizes the magnitude of the coefficient estimates,

$L = \sum_{i = 1}^n \left(y_i - x_i^{'} \hat\beta \right)^2 + \lambda \sum_{j=1}^k \hat{\beta}_j^2.$

The resulting estimate for the coefficients is

$\hat{\beta} = \left(X'X + \lambda I\right)^{-1}\left(X'Y \right).$

As $$\lambda \rightarrow 0$$, ridge regression approaches OLS. The bias and variance for the ridge estimator are

$Bias(\hat{\beta}) = -\lambda \left(X'X + \lambda I \right)^{-1} \beta$ $Var(\hat{\beta}) = \sigma^2 \left(X'X + \lambda I \right)^{-1}X'X \left(X'X + \lambda I \right)^{-1}$

The estimator bias increases with $$\lambda$$ and the estimator variance decreases with $$\lambda$$. The optimal level for $$\lambda$$ is the one that minimizes the root mean squared error (RMSE) or the Akaike or Bayesian Information Criterion (AIC or BIC), or R-squared.

#### Example

Specify alpha = 0 in a tuning grid for ridge regression (the following sections reveal how alpha distinguishes ridge, lasso, and elastic net). Note that I standardize the predictors in the preProcess step - ridge regression requires standardization.

set.seed(1234)
mdl_ridge <- train(
mpg ~ .,
data = training,
method = "glmnet",
metric = "RMSE",  # Choose from RMSE, RSquared, AIC, BIC, ...others?
preProcess = c("center", "scale"),
tuneGrid = expand.grid(
.alpha = 0,  # optimize a ridge regression
.lambda = seq(0, 5, length.out = 101)
),
trControl = train_control
)
mdl_ridge
## glmnet
##
## 28 samples
## 10 predictors
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Cross-Validated (5 fold, repeated 5 times)
## Summary of sample sizes: 21, 24, 22, 21, 24, 21, ...
## Resampling results across tuning parameters:
##
##   lambda  RMSE  Rsquared  MAE
##   0.00    2.6   0.88      2.2
##   0.05    2.6   0.88      2.2
##   0.10    2.6   0.88      2.2
##   0.15    2.6   0.88      2.2
##   0.20    2.6   0.88      2.2
##   0.25    2.6   0.88      2.2
##   0.30    2.6   0.88      2.2
##   0.35    2.6   0.88      2.2
##   0.40    2.6   0.88      2.2
##   0.45    2.6   0.88      2.2
##   0.50    2.6   0.88      2.2
##   0.55    2.5   0.88      2.2
##   0.60    2.5   0.88      2.2
##   0.65    2.5   0.88      2.2
##   0.70    2.5   0.88      2.2
##   0.75    2.5   0.88      2.2
##   0.80    2.5   0.88      2.2
##   0.85    2.5   0.88      2.2
##   0.90    2.5   0.88      2.2
##   0.95    2.5   0.88      2.2
##   1.00    2.5   0.88      2.2
##   1.05    2.5   0.88      2.2
##   1.10    2.5   0.88      2.2
##   1.15    2.5   0.88      2.1
##   1.20    2.4   0.88      2.1
##   1.25    2.4   0.88      2.1
##   1.30    2.4   0.88      2.1
##   1.35    2.4   0.88      2.1
##   1.40    2.4   0.88      2.1
##   1.45    2.4   0.88      2.1
##   1.50    2.4   0.88      2.1
##   1.55    2.4   0.88      2.1
##   1.60    2.4   0.88      2.1
##   1.65    2.4   0.88      2.1
##   1.70    2.4   0.88      2.1
##   1.75    2.4   0.88      2.1
##   1.80    2.4   0.88      2.1
##   1.85    2.4   0.88      2.1
##   1.90    2.4   0.88      2.1
##   1.95    2.4   0.88      2.1
##   2.00    2.4   0.88      2.1
##   2.05    2.4   0.88      2.1
##   2.10    2.4   0.88      2.1
##   2.15    2.4   0.88      2.1
##   2.20    2.4   0.88      2.1
##   2.25    2.4   0.88      2.1
##   2.30    2.4   0.88      2.1
##   2.35    2.4   0.88      2.1
##   2.40    2.4   0.88      2.1
##   2.45    2.4   0.88      2.1
##   2.50    2.4   0.88      2.1
##   2.55    2.4   0.88      2.1
##   2.60    2.4   0.88      2.1
##   2.65    2.4   0.88      2.1
##   2.70    2.4   0.88      2.1
##   2.75    2.4   0.88      2.1
##   2.80    2.4   0.88      2.1
##   2.85    2.4   0.88      2.1
##   2.90    2.4   0.88      2.1
##   2.95    2.4   0.88      2.1
##   3.00    2.4   0.88      2.1
##   3.05    2.4   0.88      2.1
##   3.10    2.4   0.88      2.1
##   3.15    2.4   0.88      2.1
##   3.20    2.4   0.88      2.1
##   3.25    2.4   0.88      2.1
##   3.30    2.4   0.88      2.1
##   3.35    2.4   0.88      2.1
##   3.40    2.4   0.88      2.1
##   3.45    2.4   0.88      2.1
##   3.50    2.4   0.88      2.1
##   3.55    2.4   0.88      2.1
##   3.60    2.4   0.88      2.1
##   3.65    2.4   0.88      2.1
##   3.70    2.4   0.88      2.1
##   3.75    2.4   0.88      2.1
##   3.80    2.4   0.88      2.1
##   3.85    2.4   0.88      2.1
##   3.90    2.4   0.88      2.1
##   3.95    2.4   0.88      2.1
##   4.00    2.4   0.88      2.1
##   4.05    2.4   0.88      2.1
##   4.10    2.4   0.88      2.1
##   4.15    2.4   0.88      2.1
##   4.20    2.4   0.88      2.1
##   4.25    2.4   0.88      2.1
##   4.30    2.4   0.88      2.1
##   4.35    2.4   0.88      2.1
##   4.40    2.4   0.88      2.1
##   4.45    2.4   0.88      2.1
##   4.50    2.4   0.88      2.1
##   4.55    2.4   0.88      2.1
##   4.60    2.4   0.88      2.1
##   4.65    2.4   0.88      2.1
##   4.70    2.4   0.88      2.1
##   4.75    2.4   0.88      2.1
##   4.80    2.4   0.88      2.1
##   4.85    2.4   0.88      2.1
##   4.90    2.4   0.88      2.1
##   4.95    2.4   0.88      2.1
##   5.00    2.4   0.88      2.1
##
## Tuning parameter 'alpha' was held constant at a value of 0
## RMSE was used to select the optimal model using the smallest value.
## The final values used for the model were alpha = 0 and lambda = 2.8.

The model printout shows the RMSE, R-Squared, and mean absolute error (MAE) values at each lambda specified in the tuning grid. The final three lines summarize what happened. It did not tune alpha because I held it at 0 for ridge regression; it optimized using RMSE; and the optimal tuning values (at the minimum RMSE) were alpha = 0 and lambda = 2.75. You plot the model to see the tuning results.

ggplot(mdl_ridge) +
labs(title = "Ridge Regression Parameter Tuning", x = "lambda")

varImp() ranks the predictors by the absolute value of the coefficients in the tuned model. The most important variables here were wt, disp, and am.

plot(varImp(mdl_ridge))