# Chapter 2 Examples

## 2.1 Example 1 - Backward Elimination and Forward Selection

Create regression models for predicting wrestlersâ€™ hydrostatic fat (`hwfat`

) using the data frame
`HSWRESTLER`

.

The dataset can be found in the `R`

package `PASWR2`

, which can be loaded in using

`library(PASWR2)`

- Perform quality checks on the data in
`HSWRESTLER`

as well as some basic exploratory analysis to see how the predictors (`age`

,`ht`

,`wt`

,`abs`

,`triceps`

, and`subscap`

) are related to`hwfat.`

Consider using the `pairs()`

function to perform preliminary analysis of how the predictors relate to `hwfat`

, as well as the `cor()`

function.

Since we are only interested in the predictors listed in the question, we can remove the last two.

`<- HSWRESTLER[,-c(8:9)] HSWRESTLER `

`pairs(HSWRESTLER)`

The predictors that appear to best relate to `hwfat`

are `wt`

, `abs`

, `triceps`

and `subscap`

.

`cor(HSWRESTLER)`

```
## age ht wt abs triceps subscap
## age 1.00000000 0.4177471 0.2406721 -0.05970301 -0.1349368 -0.1077762
## ht 0.41774714 1.0000000 0.6044705 0.25882739 0.1578298 0.2525007
## wt 0.24067211 0.6044705 1.0000000 0.84808755 0.7526595 0.8149226
## abs -0.05970301 0.2588274 0.8480876 1.00000000 0.9057768 0.9254833
## triceps -0.13493683 0.1578298 0.7526595 0.90577680 1.0000000 0.9455530
## subscap -0.10777616 0.2525007 0.8149226 0.92548332 0.9455530 1.0000000
## hwfat -0.17053777 0.1399135 0.7334894 0.91813556 0.9166337 0.9059425
## hwfat
## age -0.1705378
## ht 0.1399135
## wt 0.7334894
## abs 0.9181356
## triceps 0.9166337
## subscap 0.9059425
## hwfat 1.0000000
```

The correlations back up the initial interpretation, with `wt`

having a slightly weaker linear relationship than the others.

The package `car`

has several useful functions, and the function `scatterplotMatrix()`

may be useful for exploring relationships among the predictors and `hwfat.`

There are a number of arguments/options for the function `scatterplotMatrix()`

, so it's advised to refer to its help page for further
details.

```
library(car)
scatterplotMatrix(x = HSWRESTLER)
```

`#or `

```
scatterplotMatrix(formula = ~ hwfat + age + ht + wt + abs + triceps +
+ subscap, data = HSWRESTLER)
```

The plots generated by `scatterplotMatrix()`

display with more clarity why `wt`

doesn't follow as linear a relationship as the other predictors mentioned above.

- Use backward elimination with the predictors
`age`

,`ht`

,`wt`

,`abs`

,`triceps`

, and`subscap`

and an \(\alpha_{crit}\) of 0.20 to create a regression model where`hwfat`

is the response variable.

Start by defining the full model, with all listed predictors. Then use the function `drop1()`

with a `test="F"`

argument to find the first term to remove based on highest (least significant) p-value.

```
<- lm(hwfat ~ ., data = HSWRESTLER)
model.all
drop1(model.all, test = "F") # single term deletions
```

```
## Single term deletions
##
## Model:
## hwfat ~ age + ht + wt + abs + triceps + subscap
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 651.05 179.51
## age 1 9.594 660.64 178.65 1.0463 0.309839
## ht 1 1.613 652.66 177.70 0.1759 0.676225
## wt 1 2.546 653.60 177.81 0.2777 0.599879
## abs 1 162.000 813.05 194.84 17.6669 7.549e-05 ***
## triceps 1 72.683 723.73 185.76 7.9264 0.006301 **
## subscap 1 5.921 656.97 178.21 0.6458 0.424315
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Since the largest p-value from `drop1()`

is 0.6762, which corresponds to the variable `ht`

, one needs to update the original model by dropping the variable `ht`

. The function `update()`

allows the user to update a previous model (`model.all`

) using the short-hand notation. ~ . -ht. The periods on the left and right side of the tilde tell `R`

to use what is specified in the left- and right-hand sides of the tilde in `model.all`

minus the variable `ht.`

`<- update(model.all, . ~ . - ht) model.new `

The first variable we remove is

The second variable to be removed is

The third variable dropped is

The fourth variable dropped is

- Use forward selection with the predictors
`age`

,`ht`

,`wt`

,`abs`

,`triceps`

, and`subscap`

and an \(\alpha_{crit}\) of 0.20 to create a regression model where`hwfat`

is the response variable.

The functions add1() and update() are used to create a model using forward selection.

```
<- (~ . + age + ht + wt + abs + triceps + subscap)
SCOPE
<- lm(hwfat ~ 1, data = HSWRESTLER) # create a new empty model for Forwards Selection
mod.fs
add1(mod.fs, scope = SCOPE, test = "F") # SCOPE defined above determines the variables to be added
```

```
## Single term additions
##
## Model:
## hwfat ~ 1
## Df Sum of Sq RSS AIC F value Pr(>F)
## <none> 6017.8 340.97
## age 1 175.0 5842.8 340.67 2.2765 0.1355
## ht 1 117.8 5900.0 341.43 1.5175 0.2218
## wt 1 3237.6 2780.2 282.74 88.5045 2.219e-14 ***
## abs 1 5072.8 945.0 198.57 407.9929 < 2.2e-16 ***
## triceps 1 5056.3 961.5 199.92 399.6462 < 2.2e-16 ***
## subscap 1 4939.0 1078.8 208.90 347.9456 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

`<- update(mod.fs, . ~ . + abs) mod.fs `

We look to add the term which results in the lowest AIC for the model, since multiple variables are below the \(\alpha_{crit}\) of 0.20.

The first variable we add is

The second variable to be added is

The third variable added is

The third variable added is