Practical 6 - Variable Selection
Chapter 1 Intro
1.1 Intended Learning Outcomes
After attending this lab, you should be able to use R
to:
- select predictors using backward elimination and forward selection.
1.2 Introduction
In Chapter 3 of the lectures, we explored the use of confidence intervals in the selection of a model. We had already seen the use of R2 as a measure of the model’s goodness of fit and residual plots to check assumptions. These methods work best when there are only a small number of explanatory variables. We shall explore another technique for model selection.
One approach to model selection is to start with a model containing all variables and use an automatic approach in order to determine the ‘simplest/most appropriate’ model. We can do this using
The AIC is defined here as
AIC=2p−2ln(L)
AIC=nln(RSSn)+2p,
where L is the maximised value of the likelihood function for the fitted statistical model, RSS is the residual sum of squares, n is the sample size, and p is the number of parameters.
We compare models using AIC in R
using the step
command. The full model is fitted and then the step
function computes the AIC for each model after removing each explanatory variable one at a time. The
model with the lowest AIC value is then selected as the ‘best’ model.