Stepwise regression
In this section, we learn about stepwise regression as a method of model building. The procedure has advantages if there are numerous potential explanatory variables. We will not be able to consider all possible options for these functions in this course. The overall strategy behind the stepwise regression procedure is that we build our linear regression model from a set of candidate predictors by entering and/or removing predictors in a step wise manner based on a defined criterion (e.g. AIC, BIC, etc), and stop when we have a justifiable model (i.e. when adding or removing a predictor does not change or improve the chosen criterion significantly). Main approaches of stepwise selection are the forward selection, backward elimination and a combination of the two. The R
function for implementing this is step
or you can use the ols_step_
-series of functions from library olsrr
.
One can start with either the full model (all predictors) or the null model (no predictors).
Backward selection
For ‘backward’ selection we start with the full model, at each ‘step’ we fit all possible models with one variable removed from the current model. Then we compare the criterion (let’s say AIC) of each model, including the current model. The model with the best value of the criterion is chosen as the current model for the next step. If the current model scores best than the process ends stepwise model selection process is finished and we use the current model.
library(olsrr)
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
model.selection<-ols_step_backward_aic(model, details=TRUE)
## Backward Elimination Method
## ---------------------------
##
## Candidate Terms:
##
## 1. disp
## 2. hp
## 3. wt
## 4. qsec
##
##
## Step => 0
## Model => mpg ~ disp + hp + wt + qsec
## AIC => 159.0696
##
## Initiating stepwise selection...
##
## Table: Removing Existing Variables
## ---------------------------------------------------------------------
## Predictor DF AIC SBC SBIC R2 Adj. R2
## ---------------------------------------------------------------------
## disp 1 157.143 164.471 67.724 0.83477 0.81706
## qsec 1 158.643 165.972 68.825 0.82684 0.80828
## hp 1 158.720 166.049 68.881 0.82642 0.80782
## wt 1 169.853 177.181 77.315 0.75420 0.72786
## ---------------------------------------------------------------------
##
## Step => 1
## Removed => disp
## Model => mpg ~ hp + wt + qsec
## AIC => 157.1426
##
## Table: Removing Existing Variables
## ---------------------------------------------------------------------
## Predictor DF AIC SBC SBIC R2 Adj. R2
## ---------------------------------------------------------------------
## qsec 1 156.652 162.515 66.575 0.82679 0.81484
## hp 1 156.720 162.583 66.630 0.82642 0.81444
## wt 1 180.339 186.202 86.329 0.63688 0.61183
## ---------------------------------------------------------------------
##
## Step => 2
## Removed => qsec
## Model => mpg ~ hp + wt
## AIC => 156.6523
##
## Table: Removing Existing Variables
## ---------------------------------------------------------------------
## Predictor DF AIC SBC SBIC R2 Adj. R2
## ---------------------------------------------------------------------
## hp 1 166.029 170.427 74.292 0.75283 0.74459
## wt 1 181.239 185.636 87.875 0.60244 0.58919
## ---------------------------------------------------------------------
##
##
## No more variables to be removed.
##
## Variables Removed:
##
## => disp
## => qsec
## Length Class Mode
## metrics 7 data.frame list
## model 12 lm list
## others 3 -none- list
##
##
## Stepwise Summary
## ------------------------------------------------------------------------
## Step Variable AIC SBC SBIC R2 Adj. R2
## ------------------------------------------------------------------------
## 0 Full Model 159.070 167.864 70.041 0.83514 0.81072
## 1 disp 157.143 164.471 67.433 0.83477 0.81706
## 2 qsec 156.652 162.515 66.440 0.82679 0.81484
## ------------------------------------------------------------------------
##
## Final Model Output
## ------------------
##
## Model Summary
## ---------------------------------------------------------------
## R 0.909 RMSE 2.469
## R-Squared 0.827 MSE 6.726
## Adj. R-Squared 0.815 Coef. Var 12.909
## Pred R-Squared 0.781 AIC 156.652
## MAE 1.901 SBC 162.515
## ---------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
## AIC: Akaike Information Criteria
## SBC: Schwarz Bayesian Criteria
##
## ANOVA
## --------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## --------------------------------------------------------------------
## Regression 930.999 2 465.500 69.211 0.0000
## Residual 195.048 29 6.726
## Total 1126.047 31
## --------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) 37.227 1.599 23.285 0.000 33.957 40.497
## hp -0.032 0.009 -0.361 -3.519 0.001 -0.050 -0.013
## wt -3.878 0.633 -0.630 -6.129 0.000 -5.172 -2.584
## ----------------------------------------------------------------------------------------
Forward selection
A similar procedure is performed for ‘forward’ selection. Instead we start with the null model and add a variable at each step, stopping when no variable can be added to improve the criterion score.
library(olsrr)
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
model.selection<-ols_step_forward_aic(model, details=TRUE)
## Forward Selection Method
## ------------------------
##
## Candidate Terms:
##
## 1. disp
## 2. hp
## 3. wt
## 4. qsec
##
##
## Step => 0
## Model => mpg ~ 1
## AIC => 208.7555
##
## Initiating stepwise selection...
##
## Table: Adding New Variables
## ----------------------------------------------------------------------
## Predictor DF AIC SBC SBIC R2 Adj. R2
## ----------------------------------------------------------------------
## wt 1 166.029 170.427 74.292 0.75283 0.74459
## disp 1 170.209 174.607 77.984 0.71834 0.70895
## hp 1 181.239 185.636 87.875 0.60244 0.58919
## qsec 1 204.588 208.985 109.559 0.17530 0.14781
## ----------------------------------------------------------------------
##
## Step => 1
## Added => wt
## Model => mpg ~ wt
## AIC => 166.0294
##
## Table: Adding New Variables
## ---------------------------------------------------------------------
## Predictor DF AIC SBC SBIC R2 Adj. R2
## ---------------------------------------------------------------------
## hp 1 156.652 162.515 66.575 0.82679 0.81484
## qsec 1 156.720 162.583 66.630 0.82642 0.81444
## disp 1 164.168 170.031 72.684 0.78093 0.76582
## ---------------------------------------------------------------------
##
## Step => 2
## Added => hp
## Model => mpg ~ wt + hp
## AIC => 156.6523
##
## Table: Adding New Variables
## ---------------------------------------------------------------------
## Predictor DF AIC SBC SBIC R2 Adj. R2
## ---------------------------------------------------------------------
## qsec 1 157.143 164.471 67.724 0.83477 0.81706
## disp 1 158.643 165.972 68.825 0.82684 0.80828
## ---------------------------------------------------------------------
##
##
## No more variables to be added.
##
## Variables Selected:
##
## => wt
## => hp
## Length Class Mode
## metrics 7 data.frame list
## model 12 lm list
## others 4 -none- list
##
##
## Stepwise Summary
## -------------------------------------------------------------------------
## Step Variable AIC SBC SBIC R2 Adj. R2
## -------------------------------------------------------------------------
## 0 Base Model 208.756 211.687 115.039 0.00000 0.00000
## 1 wt 166.029 170.427 74.292 0.75283 0.74459
## 2 hp 156.652 162.515 66.575 0.82679 0.81484
## -------------------------------------------------------------------------
##
## Final Model Output
## ------------------
##
## Model Summary
## ---------------------------------------------------------------
## R 0.909 RMSE 2.469
## R-Squared 0.827 MSE 6.726
## Adj. R-Squared 0.815 Coef. Var 12.909
## Pred R-Squared 0.781 AIC 156.652
## MAE 1.901 SBC 162.515
## ---------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
## AIC: Akaike Information Criteria
## SBC: Schwarz Bayesian Criteria
##
## ANOVA
## --------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## --------------------------------------------------------------------
## Regression 930.999 2 465.500 69.211 0.0000
## Residual 195.048 29 6.726
## Total 1126.047 31
## --------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) 37.227 1.599 23.285 0.000 33.957 40.497
## wt -3.878 0.633 -0.630 -6.129 0.000 -5.172 -2.584
## hp -0.032 0.009 -0.361 -3.519 0.001 -0.050 -0.013
## ----------------------------------------------------------------------------------------
Forward-Backward selection
A similar procedure again, instead we start with the null model with the option to add or remove a variable at each step, stopping when no variable can be added or removed to improve the criterion score. Clearly in this case, at least the first two steps would require us to add a variable.
library(olsrr)
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
model.selection<-ols_step_both_aic(model, details=TRUE)
## Stepwise Selection Method
## -------------------------
##
## Candidate Terms:
##
## 1. disp
## 2. hp
## 3. wt
## 4. qsec
##
##
## Step => 0
## Model => mpg ~ 1
## AIC => 208.7555
##
## Initiating stepwise selection...
##
## Table: Adding New Variables
## ----------------------------------------------------------------------
## Predictor DF AIC SBC SBIC R2 Adj. R2
## ----------------------------------------------------------------------
## disp 1 170.209 174.607 77.984 0.71834 0.70895
## hp 1 181.239 185.636 87.875 0.60244 0.58919
## wt 1 166.029 170.427 74.292 0.75283 0.74459
## qsec 1 204.588 208.985 109.559 0.17530 0.14781
## ----------------------------------------------------------------------
##
## Step => 1
## Added => wt
## Model => mpg ~ wt
## AIC => 166.0294
##
## Table: Adding New Variables
## ---------------------------------------------------------------------
## Predictor DF AIC SBC SBIC R2 Adj. R2
## ---------------------------------------------------------------------
## disp 1 164.168 170.031 72.684 0.78093 0.76582
## hp 1 156.652 162.515 66.575 0.82679 0.81484
## qsec 1 156.720 162.583 66.630 0.82642 0.81444
## ---------------------------------------------------------------------
##
## Step => 2
## Added => hp
## Model => mpg ~ wt + hp
## AIC => 156.6523
##
## Table: Removing Existing Variables
## ---------------------------------------------------------------------
## Predictor DF AIC SBC SBIC R2 Adj. R2
## ---------------------------------------------------------------------
## wt 1 181.239 185.636 87.875 0.60244 0.58919
## hp 1 166.029 170.427 74.292 0.75283 0.74459
## ---------------------------------------------------------------------
##
## Table: Adding New Variables
## ---------------------------------------------------------------------
## Predictor DF AIC SBC SBIC R2 Adj. R2
## ---------------------------------------------------------------------
## disp 1 158.643 165.972 68.825 0.82684 0.80828
## qsec 1 157.143 164.471 67.724 0.83477 0.81706
## ---------------------------------------------------------------------
##
##
## No more variables to be added or removed.
##
## Variables Selected:
##
## => wt
## => hp
## Length Class Mode
## metrics 8 data.frame list
## model 12 lm list
## others 4 -none- list
##
##
## Stepwise Summary
## -------------------------------------------------------------------------
## Step Variable AIC SBC SBIC R2 Adj. R2
## -------------------------------------------------------------------------
## 0 Base Model 208.756 211.687 115.039 0.00000 0.00000
## 1 wt (+) 166.029 170.427 74.292 0.75283 0.74459
## 2 hp (+) 156.652 162.515 66.575 0.82679 0.81484
## -------------------------------------------------------------------------
##
## Final Model Output
## ------------------
##
## Model Summary
## ---------------------------------------------------------------
## R 0.909 RMSE 2.469
## R-Squared 0.827 MSE 6.726
## Adj. R-Squared 0.815 Coef. Var 12.909
## Pred R-Squared 0.781 AIC 156.652
## MAE 1.901 SBC 162.515
## ---------------------------------------------------------------
## RMSE: Root Mean Square Error
## MSE: Mean Square Error
## MAE: Mean Absolute Error
## AIC: Akaike Information Criteria
## SBC: Schwarz Bayesian Criteria
##
## ANOVA
## --------------------------------------------------------------------
## Sum of
## Squares DF Mean Square F Sig.
## --------------------------------------------------------------------
## Regression 930.999 2 465.500 69.211 0.0000
## Residual 195.048 29 6.726
## Total 1126.047 31
## --------------------------------------------------------------------
##
## Parameter Estimates
## ----------------------------------------------------------------------------------------
## model Beta Std. Error Std. Beta t Sig lower upper
## ----------------------------------------------------------------------------------------
## (Intercept) 37.227 1.599 23.285 0.000 33.957 40.497
## wt -3.878 0.633 -0.630 -6.129 0.000 -5.172 -2.584
## hp -0.032 0.009 -0.361 -3.519 0.001 -0.050 -0.013
## ----------------------------------------------------------------------------------------