# Chapter 1 Intro

## 1.1 Intended Learning Outcomes

After attending this lab, you should be able to use R to:

• select predictors using backward elimination and forward selection.

## 1.2 Introduction

In Chapter 3 of the lectures, we explored the use of confidence intervals in the selection of a model. We had already seen the use of $$R^2$$ as a measure of the model’s goodness of fit and residual plots to check assumptions. These methods work best when there are only a small number of explanatory variables. We shall explore another technique for model selection.

One approach to model selection is to start with a model containing all variables and use an automatic approach in order to determine the ‘simplest/most appropriate’ model. We can do this using

Akaike Information Criterion (AIC)

The AIC is defined here as

$\mbox{AIC} = 2p - 2\mbox{ln}(L)$

or, equivalently,

$\mbox{AIC} = n\mbox{ln}(\frac{\mbox{RSS}}{n})+2p,$

where $$L$$ is the maximised value of the likelihood function for the fitted statistical model, RSS is the residual sum of squares, $$n$$ is the sample size, and $$p$$ is the number of parameters.

We compare models using AIC in R using the step command. The full model is fitted and then the step function computes the AIC for each model after removing each explanatory variable one at a time. The model with the lowest AIC value is then selected as the ‘best’ model.