Methods for variable selection
Let us now discuss a few strategies for variable selection. These methods systematically search through our possible explanatory variables and determine which ones we should include in our model.
Selection criterion
We need to define a rule, or selection criterion, that we may use to determine which variables to include. There are several variable selection criterion options. The criterion for variable selection include adjusted R-squared (R^2(adj)), Akaike information criterion (AIC), Bayesian information criterion (BIC), the Mallows’s C_p statistic and the PRESS statistic. We have already met adjusted R-squared (R^2(adj)) and we will only brief describe the others listed.
- R^2 or R^2adj
- Akaike Information Criterion (AIC) is defined as \mbox{AIC}= -2l(\boldsymbol{\beta}) + 2p
- (Schwarz) Bayesian Information Criterion (BIC or sbc) is defined as \mbox{BIC}= -2l(\boldsymbol{\beta}) + 2p \log(n)
- Mallow’s C_p is defined as C_p= \frac{RSS_{p}}{\hat{\sigma}_{F}^2}-n + 2p
Once we have chosen a selection criterion, we need to choose a method that systematically removes and/or includes variables based on the chosen criterion. The first method we will consider is all subset regression