Model Summary

Make predictions on the validation data set for each of the three models.

pr_ridge <- postResample(pred = predict(mdl_ridge, newdata = testing), obs = testing$mpg)
pr_lasso <- postResample(pred = predict(mdl_lasso, newdata = testing), obs = testing$mpg)
pr_elnet <- postResample(pred = predict(mdl_elnet, newdata = testing), obs = testing$mpg)

rbind(pr_ridge, pr_lasso, pr_elnet)

##          RMSE Rsquared MAE
## pr_ridge  3.7     0.90 2.8
## pr_lasso  4.0     0.97 3.0
## pr_elnet  3.7     0.90 2.8

It looks like ridge/elnet was the big winner today based on RMSE and MAE. Lasso had the best Rsquared though. On average, ridge/elnet will miss the true value of mpg by 3.75 mpg (RMSE) or 2.76 mpg (MAE). The model explains about 90% of the variation in mpg.

You can also compare the models by resampling.

model.resamples <- resamples(list(Ridge = mdl_ridge,
                                  Lasso = mdl_lasso,
                                  ELNet = mdl_elnet))
summary(model.resamples)

## 
## Call:
## summary.resamples(object = model.resamples)
## 
## Models: Ridge, Lasso, ELNet 
## Number of resamples: 25 
## 
## MAE 
##       Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## Ridge 0.66     1.6    2.2  2.1     2.5  3.5    0
## Lasso 0.81     1.9    2.2  2.3     2.6  4.0    0
## ELNet 0.66     1.6    2.2  2.1     2.5  3.5    0
## 
## RMSE 
##       Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## Ridge 0.73     1.9    2.5  2.4     2.8  4.3    0
## Lasso 0.91     2.1    2.5  2.6     2.9  4.5    0
## ELNet 0.73     1.9    2.5  2.4     2.8  4.3    0
## 
## Rsquared 
##       Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## Ridge 0.69    0.84   0.89 0.88    0.94 0.98    0
## Lasso 0.63    0.81   0.87 0.86    0.94 1.00    0
## ELNet 0.69    0.84   0.89 0.88    0.94 0.98    0

You want the smallest mean RMSE, and a small range of RMSEs. Ridge/elnet had the smallest mean, and a relatively small range. Boxplots are a common way to visualize this information.

bwplot(model.resamples, metric = "RMSE", main = "Model Comparison on Resamples")

Now that you have identified the optimal model, capture its tuning parameters and refit the model to the entire data set.

set.seed(123)
mdl_final <- train(
  mpg ~ .,
  data = training,
  method = "glmnet",
  metric = "RMSE",
  preProcess = c("center", "scale"),
  tuneGrid = data.frame(
    .alpha = mdl_ridge$bestTune$alpha,  # optimized hyperparameters
    .lambda = mdl_ridge$bestTune$lambda),  # optimized hyperparameters
  trControl = train_control
  )
mdl_final

## glmnet 
## 
## 28 samples
## 10 predictors
## 
## Pre-processing: centered (10), scaled (10) 
## Resampling: Cross-Validated (5 fold, repeated 5 times) 
## Summary of sample sizes: 22, 22, 23, 22, 23, 23, ... 
## Resampling results:
## 
##   RMSE  Rsquared  MAE
##   2.4   0.89      2.1
## 
## Tuning parameter 'alpha' was held constant at a value of 0
## Tuning
##  parameter 'lambda' was held constant at a value of 2.8

The model is ready to predict on new data! Here are some final conclusions on the models.

Lasso can set some coefficients to zero, thus performing variable selection.
Lasso and Ridge address multicollinearity differently: in ridge regression, the coefficients of correlated predictors are similar; In lasso, one of the correlated predictors has a larger coefficient, while the rest are (nearly) zeroed.
Lasso tends to do well if there are a small number of significant parameters and the others are close to zero. Ridge tends to work well if there are many large parameters of about the same value.
In practice, you don’t know which will be best, so run cross-validation pick the best.