## Model Summary

Make predictions on the validation data set for each of the three models.

pr_ridge <- postResample(pred = predict(mdl_ridge, newdata = testing), obs = testing$mpg) pr_lasso <- postResample(pred = predict(mdl_lasso, newdata = testing), obs = testing$mpg)
pr_elnet <- postResample(pred = predict(mdl_elnet, newdata = testing), obs = testing$mpg) rbind(pr_ridge, pr_lasso, pr_elnet) ## RMSE Rsquared MAE ## pr_ridge 3.7 0.90 2.8 ## pr_lasso 4.0 0.97 3.0 ## pr_elnet 3.7 0.90 2.8 It looks like ridge/elnet was the big winner today based on RMSE and MAE. Lasso had the best Rsquared though. On average, ridge/elnet will miss the true value of mpg by 3.75 mpg (RMSE) or 2.76 mpg (MAE). The model explains about 90% of the variation in mpg. You can also compare the models by resampling. model.resamples <- resamples(list(Ridge = mdl_ridge, Lasso = mdl_lasso, ELNet = mdl_elnet)) summary(model.resamples) ## ## Call: ## summary.resamples(object = model.resamples) ## ## Models: Ridge, Lasso, ELNet ## Number of resamples: 25 ## ## MAE ## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## Ridge 0.66 1.6 2.2 2.1 2.5 3.5 0 ## Lasso 0.81 1.9 2.2 2.3 2.6 4.0 0 ## ELNet 0.66 1.6 2.2 2.1 2.5 3.5 0 ## ## RMSE ## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## Ridge 0.73 1.9 2.5 2.4 2.8 4.3 0 ## Lasso 0.91 2.1 2.5 2.6 2.9 4.5 0 ## ELNet 0.73 1.9 2.5 2.4 2.8 4.3 0 ## ## Rsquared ## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's ## Ridge 0.69 0.84 0.89 0.88 0.94 0.98 0 ## Lasso 0.63 0.81 0.87 0.86 0.94 1.00 0 ## ELNet 0.69 0.84 0.89 0.88 0.94 0.98 0 You want the smallest mean RMSE, and a small range of RMSEs. Ridge/elnet had the smallest mean, and a relatively small range. Boxplots are a common way to visualize this information. bwplot(model.resamples, metric = "RMSE", main = "Model Comparison on Resamples") Now that you have identified the optimal model, capture its tuning parameters and refit the model to the entire data set. set.seed(123) mdl_final <- train( mpg ~ ., data = training, method = "glmnet", metric = "RMSE", preProcess = c("center", "scale"), tuneGrid = data.frame( .alpha = mdl_ridge$bestTune$alpha, # optimized hyperparameters .lambda = mdl_ridge$bestTune\$lambda),  # optimized hyperparameters
trControl = train_control
)
mdl_final
## glmnet
##
## 28 samples
## 10 predictors
##
## Pre-processing: centered (10), scaled (10)
## Resampling: Cross-Validated (5 fold, repeated 5 times)
## Summary of sample sizes: 22, 22, 23, 22, 23, 23, ...
## Resampling results:
##
##   RMSE  Rsquared  MAE
##   2.4   0.89      2.1
##
## Tuning parameter 'alpha' was held constant at a value of 0
## Tuning
##  parameter 'lambda' was held constant at a value of 2.8

The model is ready to predict on new data! Here are some final conclusions on the models.

• Lasso can set some coefficients to zero, thus performing variable selection.
• Lasso and Ridge address multicollinearity differently: in ridge regression, the coefficients of correlated predictors are similar; In lasso, one of the correlated predictors has a larger coefficient, while the rest are (nearly) zeroed.
• Lasso tends to do well if there are a small number of significant parameters and the others are close to zero. Ridge tends to work well if there are many large parameters of about the same value.
• In practice, you don’t know which will be best, so run cross-validation pick the best.