Chapter 1 Intro


1.1 Intended Learning Outcomes

After attending this lab, you should be able to use R to:

  • select predictors using backward elimination and forward selection.

1.2 Introduction

In Chapter 3 of the lectures, we explored the use of confidence intervals in the selection of a model. We had already seen the use of R2 as a measure of the model’s goodness of fit and residual plots to check assumptions. These methods work best when there are only a small number of explanatory variables. We shall explore another technique for model selection.

One approach to model selection is to start with a model containing all variables and use an automatic approach in order to determine the ‘simplest/most appropriate’ model. We can do this using

Akaike Information Criterion (AIC)

The AIC is defined here as

AIC=2p2ln(L)

or, equivalently,

AIC=nln(RSSn)+2p,

where L is the maximised value of the likelihood function for the fitted statistical model, RSS is the residual sum of squares, n is the sample size, and p is the number of parameters.

We compare models using AIC in R using the step command. The full model is fitted and then the step function computes the AIC for each model after removing each explanatory variable one at a time. The model with the lowest AIC value is then selected as the ‘best’ model.