Practical 6 - Variable Selection
Chapter 1 Intro
1.1 Intended Learning Outcomes
After attending this lab, you should be able to use R
to:
- select predictors using backward elimination and forward selection.
1.2 Introduction
In Chapter 3 of the lectures, we explored the use of confidence intervals in the selection of a model. We had already seen the use of \(R^2\) as a measure of the model’s goodness of fit and residual plots to check assumptions. These methods work best when there are only a small number of explanatory variables. We shall explore another technique for model selection.
One approach to model selection is to start with a model containing all variables and use an automatic approach in order to determine the ‘simplest/most appropriate’ model. We can do this using
The AIC is defined here as
\[\mbox{AIC} = 2p - 2\mbox{ln}(L)\]
\[ \mbox{AIC} = n\mbox{ln}(\frac{\mbox{RSS}}{n})+2p,\]
where \(L\) is the maximised value of the likelihood function for the fitted statistical model, RSS is the residual sum of squares, \(n\) is the sample size, and \(p\) is the number of parameters.
We compare models using AIC in R
using the step
command. The full model is fitted and then the step
function computes the AIC for each model after removing each explanatory variable one at a time. The
model with the lowest AIC value is then selected as the ‘best’ model.