Stepwise regression

In this section, we learn about stepwise regression as a method of model building. The procedure has advantages if there are numerous potential explanatory variables. We will not be able to consider all possible options for these functions in this course. The overall strategy behind the stepwise regression procedure is that we build our linear regression model from a set of candidate predictors by entering and/or removing predictors in a step wise manner based on a defined criterion (e.g. AIC, BIC, etc), and stop when we have a justifiable model (i.e. when adding or removing a predictor does not change or improve the chosen criterion significantly). Main approaches of stepwise selection are the forward selection, backward elimination and a combination of the two. The R function for implementing this is step or you can use the ols_step_-series of functions from library olsrr.

One can start with either the full model (all predictors) or the null model (no predictors).

Backward selection

For ‘backward’ selection we start with the full model, at each ‘step’ we fit all possible models with one variable removed from the current model. Then we compare the criterion (let’s say AIC) of each model, including the current model. The model with the best value of the criterion is chosen as the current model for the next step. If the current model scores best than the process ends stepwise model selection process is finished and we use the current model.

library(olsrr)
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
model.selection<-ols_step_backward_aic(model, details=TRUE)
## Backward Elimination Method 
## ---------------------------
## 
## Candidate Terms: 
## 
## 1. disp 
## 2. hp 
## 3. wt 
## 4. qsec 
## 
## 
## Step     => 0 
## Model    => mpg ~ disp + hp + wt + qsec 
## AIC      => 159.0696 
## 
## Initiating stepwise selection... 
## 
##                  Table: Removing Existing Variables                   
## ---------------------------------------------------------------------
## Predictor    DF      AIC        SBC       SBIC       R2       Adj. R2 
## ---------------------------------------------------------------------
## disp          1    157.143    164.471    67.724    0.83477    0.81706 
## qsec          1    158.643    165.972    68.825    0.82684    0.80828 
## hp            1    158.720    166.049    68.881    0.82642    0.80782 
## wt            1    169.853    177.181    77.315    0.75420    0.72786 
## ---------------------------------------------------------------------
## 
## Step     => 1 
## Removed  => disp 
## Model    => mpg ~ hp + wt + qsec 
## AIC      => 157.1426 
## 
##                  Table: Removing Existing Variables                   
## ---------------------------------------------------------------------
## Predictor    DF      AIC        SBC       SBIC       R2       Adj. R2 
## ---------------------------------------------------------------------
## qsec          1    156.652    162.515    66.575    0.82679    0.81484 
## hp            1    156.720    162.583    66.630    0.82642    0.81444 
## wt            1    180.339    186.202    86.329    0.63688    0.61183 
## ---------------------------------------------------------------------
## 
## Step     => 2 
## Removed  => qsec 
## Model    => mpg ~ hp + wt 
## AIC      => 156.6523 
## 
##                  Table: Removing Existing Variables                   
## ---------------------------------------------------------------------
## Predictor    DF      AIC        SBC       SBIC       R2       Adj. R2 
## ---------------------------------------------------------------------
## hp            1    166.029    170.427    74.292    0.75283    0.74459 
## wt            1    181.239    185.636    87.875    0.60244    0.58919 
## ---------------------------------------------------------------------
## 
## 
## No more variables to be removed.
## 
## Variables Removed: 
## 
## => disp 
## => qsec
summary(model.selection)
##         Length Class      Mode
## metrics  7     data.frame list
## model   12     lm         list
## others   3     -none-     list
model.selection
## 
## 
##                              Stepwise Summary                             
## ------------------------------------------------------------------------
## Step    Variable        AIC        SBC       SBIC       R2       Adj. R2 
## ------------------------------------------------------------------------
##  0      Full Model    159.070    167.864    70.041    0.83514    0.81072 
##  1      disp          157.143    164.471    67.433    0.83477    0.81706 
##  2      qsec          156.652    162.515    66.440    0.82679    0.81484 
## ------------------------------------------------------------------------
## 
## Final Model Output 
## ------------------
## 
##                          Model Summary                          
## ---------------------------------------------------------------
## R                       0.909       RMSE                 2.469 
## R-Squared               0.827       MSE                  6.726 
## Adj. R-Squared          0.815       Coef. Var           12.909 
## Pred R-Squared          0.781       AIC                156.652 
## MAE                     1.901       SBC                162.515 
## ---------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
##  AIC: Akaike Information Criteria 
##  SBC: Schwarz Bayesian Criteria 
## 
##                                ANOVA                                 
## --------------------------------------------------------------------
##                 Sum of                                              
##                Squares        DF    Mean Square      F         Sig. 
## --------------------------------------------------------------------
## Regression     930.999         2        465.500    69.211    0.0000 
## Residual       195.048        29          6.726                     
## Total         1126.047        31                                    
## --------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)    37.227         1.599                 23.285    0.000    33.957    40.497 
##          hp    -0.032         0.009       -0.361    -3.519    0.001    -0.050    -0.013 
##          wt    -3.878         0.633       -0.630    -6.129    0.000    -5.172    -2.584 
## ----------------------------------------------------------------------------------------

Forward selection

A similar procedure is performed for ‘forward’ selection. Instead we start with the null model and add a variable at each step, stopping when no variable can be added to improve the criterion score.

library(olsrr)
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
model.selection<-ols_step_forward_aic(model, details=TRUE)
## Forward Selection Method 
## ------------------------
## 
## Candidate Terms: 
## 
## 1. disp 
## 2. hp 
## 3. wt 
## 4. qsec 
## 
## 
## Step     => 0 
## Model    => mpg ~ 1 
## AIC      => 208.7555 
## 
## Initiating stepwise selection... 
## 
##                      Table: Adding New Variables                       
## ----------------------------------------------------------------------
## Predictor    DF      AIC        SBC       SBIC        R2       Adj. R2 
## ----------------------------------------------------------------------
## wt            1    166.029    170.427     74.292    0.75283    0.74459 
## disp          1    170.209    174.607     77.984    0.71834    0.70895 
## hp            1    181.239    185.636     87.875    0.60244    0.58919 
## qsec          1    204.588    208.985    109.559    0.17530    0.14781 
## ----------------------------------------------------------------------
## 
## Step     => 1 
## Added    => wt 
## Model    => mpg ~ wt 
## AIC      => 166.0294 
## 
##                      Table: Adding New Variables                      
## ---------------------------------------------------------------------
## Predictor    DF      AIC        SBC       SBIC       R2       Adj. R2 
## ---------------------------------------------------------------------
## hp            1    156.652    162.515    66.575    0.82679    0.81484 
## qsec          1    156.720    162.583    66.630    0.82642    0.81444 
## disp          1    164.168    170.031    72.684    0.78093    0.76582 
## ---------------------------------------------------------------------
## 
## Step     => 2 
## Added    => hp 
## Model    => mpg ~ wt + hp 
## AIC      => 156.6523 
## 
##                      Table: Adding New Variables                      
## ---------------------------------------------------------------------
## Predictor    DF      AIC        SBC       SBIC       R2       Adj. R2 
## ---------------------------------------------------------------------
## qsec          1    157.143    164.471    67.724    0.83477    0.81706 
## disp          1    158.643    165.972    68.825    0.82684    0.80828 
## ---------------------------------------------------------------------
## 
## 
## No more variables to be added.
## 
## Variables Selected: 
## 
## => wt 
## => hp
summary(model.selection)
##         Length Class      Mode
## metrics  7     data.frame list
## model   12     lm         list
## others   4     -none-     list
model.selection
## 
## 
##                              Stepwise Summary                              
## -------------------------------------------------------------------------
## Step    Variable        AIC        SBC       SBIC        R2       Adj. R2 
## -------------------------------------------------------------------------
##  0      Base Model    208.756    211.687    115.039    0.00000    0.00000 
##  1      wt            166.029    170.427     74.292    0.75283    0.74459 
##  2      hp            156.652    162.515     66.575    0.82679    0.81484 
## -------------------------------------------------------------------------
## 
## Final Model Output 
## ------------------
## 
##                          Model Summary                          
## ---------------------------------------------------------------
## R                       0.909       RMSE                 2.469 
## R-Squared               0.827       MSE                  6.726 
## Adj. R-Squared          0.815       Coef. Var           12.909 
## Pred R-Squared          0.781       AIC                156.652 
## MAE                     1.901       SBC                162.515 
## ---------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
##  AIC: Akaike Information Criteria 
##  SBC: Schwarz Bayesian Criteria 
## 
##                                ANOVA                                 
## --------------------------------------------------------------------
##                 Sum of                                              
##                Squares        DF    Mean Square      F         Sig. 
## --------------------------------------------------------------------
## Regression     930.999         2        465.500    69.211    0.0000 
## Residual       195.048        29          6.726                     
## Total         1126.047        31                                    
## --------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)    37.227         1.599                 23.285    0.000    33.957    40.497 
##          wt    -3.878         0.633       -0.630    -6.129    0.000    -5.172    -2.584 
##          hp    -0.032         0.009       -0.361    -3.519    0.001    -0.050    -0.013 
## ----------------------------------------------------------------------------------------

Forward-Backward selection

A similar procedure again, instead we start with the null model with the option to add or remove a variable at each step, stopping when no variable can be added or removed to improve the criterion score. Clearly in this case, at least the first two steps would require us to add a variable.

library(olsrr)
model <- lm(mpg ~ disp + hp + wt + qsec, data = mtcars)
model.selection<-ols_step_both_aic(model, details=TRUE)
## Stepwise Selection Method 
## -------------------------
## 
## Candidate Terms: 
## 
## 1. disp 
## 2. hp 
## 3. wt 
## 4. qsec 
## 
## 
## Step     => 0 
## Model    => mpg ~ 1 
## AIC      => 208.7555 
## 
## Initiating stepwise selection... 
## 
##                      Table: Adding New Variables                       
## ----------------------------------------------------------------------
## Predictor    DF      AIC        SBC       SBIC        R2       Adj. R2 
## ----------------------------------------------------------------------
## disp          1    170.209    174.607     77.984    0.71834    0.70895 
## hp            1    181.239    185.636     87.875    0.60244    0.58919 
## wt            1    166.029    170.427     74.292    0.75283    0.74459 
## qsec          1    204.588    208.985    109.559    0.17530    0.14781 
## ----------------------------------------------------------------------
## 
## Step     => 1 
## Added    => wt 
## Model    => mpg ~ wt 
## AIC      => 166.0294 
## 
##                      Table: Adding New Variables                      
## ---------------------------------------------------------------------
## Predictor    DF      AIC        SBC       SBIC       R2       Adj. R2 
## ---------------------------------------------------------------------
## disp          1    164.168    170.031    72.684    0.78093    0.76582 
## hp            1    156.652    162.515    66.575    0.82679    0.81484 
## qsec          1    156.720    162.583    66.630    0.82642    0.81444 
## ---------------------------------------------------------------------
## 
## Step     => 2 
## Added    => hp 
## Model    => mpg ~ wt + hp 
## AIC      => 156.6523 
## 
##                  Table: Removing Existing Variables                   
## ---------------------------------------------------------------------
## Predictor    DF      AIC        SBC       SBIC       R2       Adj. R2 
## ---------------------------------------------------------------------
## wt            1    181.239    185.636    87.875    0.60244    0.58919 
## hp            1    166.029    170.427    74.292    0.75283    0.74459 
## ---------------------------------------------------------------------
## 
##                      Table: Adding New Variables                      
## ---------------------------------------------------------------------
## Predictor    DF      AIC        SBC       SBIC       R2       Adj. R2 
## ---------------------------------------------------------------------
## disp          1    158.643    165.972    68.825    0.82684    0.80828 
## qsec          1    157.143    164.471    67.724    0.83477    0.81706 
## ---------------------------------------------------------------------
## 
## 
## No more variables to be added or removed.
## 
## Variables Selected: 
## 
## => wt 
## => hp
summary(model.selection)
##         Length Class      Mode
## metrics  8     data.frame list
## model   12     lm         list
## others   4     -none-     list
model.selection
## 
## 
##                              Stepwise Summary                              
## -------------------------------------------------------------------------
## Step    Variable        AIC        SBC       SBIC        R2       Adj. R2 
## -------------------------------------------------------------------------
##  0      Base Model    208.756    211.687    115.039    0.00000    0.00000 
##  1      wt (+)        166.029    170.427     74.292    0.75283    0.74459 
##  2      hp (+)        156.652    162.515     66.575    0.82679    0.81484 
## -------------------------------------------------------------------------
## 
## Final Model Output 
## ------------------
## 
##                          Model Summary                          
## ---------------------------------------------------------------
## R                       0.909       RMSE                 2.469 
## R-Squared               0.827       MSE                  6.726 
## Adj. R-Squared          0.815       Coef. Var           12.909 
## Pred R-Squared          0.781       AIC                156.652 
## MAE                     1.901       SBC                162.515 
## ---------------------------------------------------------------
##  RMSE: Root Mean Square Error 
##  MSE: Mean Square Error 
##  MAE: Mean Absolute Error 
##  AIC: Akaike Information Criteria 
##  SBC: Schwarz Bayesian Criteria 
## 
##                                ANOVA                                 
## --------------------------------------------------------------------
##                 Sum of                                              
##                Squares        DF    Mean Square      F         Sig. 
## --------------------------------------------------------------------
## Regression     930.999         2        465.500    69.211    0.0000 
## Residual       195.048        29          6.726                     
## Total         1126.047        31                                    
## --------------------------------------------------------------------
## 
##                                   Parameter Estimates                                    
## ----------------------------------------------------------------------------------------
##       model      Beta    Std. Error    Std. Beta      t        Sig      lower     upper 
## ----------------------------------------------------------------------------------------
## (Intercept)    37.227         1.599                 23.285    0.000    33.957    40.497 
##          wt    -3.878         0.633       -0.630    -6.129    0.000    -5.172    -2.584 
##          hp    -0.032         0.009       -0.361    -3.519    0.001    -0.050    -0.013 
## ----------------------------------------------------------------------------------------