# Practical 6 Solutions

Make sure to at least finish Question 1 from Practical 5 before continuing with this practical! It might be tempting to skip some stuff if you’re behind, but you may end up more lost\(...\)we will happily answer questions from any previous practical each session.

In Questions 1 and 2, we again analyse the `seatpos`

data from library `faraway`

.

### Q1 Splines

In this question, we will again analyse the effects of predictor variable `Ht`

on the response variable `hipcenter`

.

- Load the necessary packages for the dataset and spline functions. Assign the
`hipcenter`

values to a vector`y`

, and the`Ht`

variable values to a vector`x`

.

```
library("MASS")
library("faraway")
library("splines")
= seatpos$hipcenter
y = seatpos$Ht
x
# Plot labels for later.
= 'hip center (mm)'
y.lab = 'Height (bare foot) in cm' x.lab
```

- Find the 25th, 50th and 75th percentiles of
`x`

, storing them in a vector`cuts`

.

`summary(x)`

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 150.2 163.6 169.5 169.1 175.7 198.4
```

`<- summary(x)[c(2,3,5)] cuts `

- Use a linear spline to model
`hipcenter`

as a function of`Ht`

, putting knots at the 25th, 50th and 75th percentiles of`x`

.

`= lm(y ~ bs(x, degree = 1, knots = cuts)) spline1 `

- Plot the fitted linear spline from part (c) over the data, along with \(\pm2\) standard deviation confidence intervals.

```
# Sort the values of x from low to high.
<- sort(x)
sort.x
# Obtain predictions for the fitted values along with confidence intervals.
= predict(spline1, newdata = list(x = sort.x), se = TRUE)
pred1 = cbind(pred1$fit + 2 * pred1$se.fit, pred1$fit - 2 * pred1$se.fit)
se.bands1
# Plot the fitted linear spline model over the data.
plot(x, y, cex.lab = 1.1, col="darkgrey", xlab = x.lab, ylab = y.lab,
main = "Linear Spline", bty = 'l')
lines(sort.x, pred1$fit, lwd = 2, col = "red")
matlines(sort.x, se.bands1, lwd = 2, col = "red", lty = 3)
```

- Use a smoothing spline to model
`hipcenter`

as a function of`Ht`

, selecting \(\lambda\) with cross-validation, and generate a relevant plot. What do you notice?

`= smooth.spline(x, y, cv = TRUE) smooth1 `

```
## Warning in smooth.spline(x, y, cv = TRUE): cross-validation with non-unique 'x'
## values seems doubtful
```

```
plot(x, y, cex.lab = 1.1, col="darkgrey", xlab = x.lab, ylab = y.lab,
main = "Smoothing Spline (3df)", bty = 'l')
lines(smooth1, lwd = 2, col = "brown")
```

`# The fitted model is almost completely linear.`

### Q2 GAMs

- Fit a GAM for
`hipcenter`

that consists of three terms:

- a natural spline with 5 degrees of freedom for
`Age`

, - a smoothing spline with 3 degrees of freedom for
`Thigh`

, and - a simple linear model term for
`Ht`

.

```
# Require library gam.
library(gam)
# Fit a GAM.
= gam( hipcenter ~ ns( Age, df = 5 ) + s( Thigh, df = 3 ) + Ht,
gam data = seatpos )
```

- Plot the resulting contributions of each term to the GAM, and compare them with plots of
`hipcenter`

against each of the three variables`Age`

,`Thigh`

and`Ht`

.

```
# Plot the contributions.
par( mfrow = c(2,3) )
plot( gam, se = TRUE, col = "blue" )
# Compare with the following plots.
plot( seatpos$Age, seatpos$hipcenter, pch = 16, col = 2,
ylab = y.lab, xlab = "Age (years)" )
plot( seatpos$Thigh, seatpos$hipcenter, pch = 16, col = 2,
ylab = y.lab, xlab = "Thigh length (cm)" )
plot( seatpos$Ht, seatpos$hipcenter, pch = 16, col = 2,
ylab = y.lab, xlab = "Ht (bare foot) (cm)" )
```

- Does the contribution of each term of the GAM make sense in light of these pair-wise plots. Is the GAM fitting the data well?

```
# The linear fit of Ht is quite clear. Perhaps the fits for Age and Thigh
# are less clear, although I think we can see that the curves are fitting
# the data somewhat.
# Perhaps they are overfitting, capturing variation across the predictor
# ranges (in the sample) that isn't really there (in the population).
```

### Q3 `Boston`

Data

- Investigate modelling
`medv`

using regression, natural and smoothing splines, taking`indus`

as a single predictor.

```
library(MASS)
library(splines)
= Boston$medv
y = Boston$indus
x = 'Median Property Value'
y.lab = 'Non-retail business acres per town'
x.lab
# Use summary on x to find 25th, 50th and 75th percentiles. We will use
# these for the positions of the knots for regression and natural splines.
summary(x)
```

```
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.46 5.19 9.69 11.14 18.10 27.74
```

```
<- summary(x)[c(2,3,5)]
cuts cuts
```

```
## 1st Qu. Median 3rd Qu.
## 5.19 9.69 18.10
```

```
# sort x for later.
= sort(x)
sort.x
# Fit a cubic spline
= lm(y ~ bs(x, knots = cuts))
spline.bs = predict(spline.bs, newdata = list(x = sort.x), se = TRUE)
pred.bs = cbind(pred.bs$fit + 2 * pred.bs$se.fit,
se.bands.bs $fit - 2 * pred.bs$se.fit)
pred.bs
# Fit a natural cubic spline.
= lm(y ~ ns(x, knots = cuts))
spline.ns = predict(spline.ns, newdata = list(x = sort.x), se = TRUE)
pred.ns = cbind(pred.ns$fit + 2 * pred.ns$se.fit,
se.bands.ns $fit - 2 * pred.ns$se.fit)
pred.ns
# Fit a smoothing spline, with 3 effective degrees of freedom.
= smooth.spline(x, y, df = 3)
spline.smooth
# Split the plotting device into 3.
par(mfrow = c(1,3))
# Plot the cubic spline.
plot(x, y, cex.lab = 1.1, col="darkgrey", xlab = x.lab, ylab = y.lab,
main = "Cubic Spline", bty = 'l')
lines(sort.x, pred.bs$fit, lwd = 2, col = "red")
matlines(sort.x, se.bands.bs, lwd = 2, col = "red", lty = 3)
# Plot the natural cubic spline.
plot(x, y, cex.lab = 1.1, col="darkgrey", xlab = x.lab, ylab = y.lab,
main = "Natural Cubic Spline", bty = 'l')
lines(sort.x, pred.ns$fit, lwd = 2, col = "darkred")
matlines(sort.x, se.bands.ns, lwd = 2, col = "darkred", lty = 3)
# Plot the smoothing spline.
plot(x, y, cex.lab = 1.1, col="darkgrey", xlab = x.lab, ylab = y.lab,
main = "Smoothing Spline (3 df)", bty = 'l')
lines(spline.smooth, lwd = 2, col = "brown")
```

- Choose a set of predictor variables to use in order to fit a GAM for
`medv`

. Feel free to experiment with different modelling options.

```
# Of course we discussed one possible GAM in practical demonstration
# section of Chapter 10. We try another one here.
# Let's set chas to be a factor (you may want to make rad a factor as well,
# if used).
= Boston
Boston1 $chas = factor(Boston1$chas)
Boston1
# Let's fit a GAM - can you see how each predictor is contributing to
# modelling the response?
= gam( medv ~ ns( lstat, df = 5 ) + ns( nox, df = 7 ) +
gam1 s( indus, df = 7 ) + poly( age, 5 ) + chas, data = Boston1 )
par(mfrow = c(2,3))
plot(gam1, se = TRUE, col = "blue")
```

Boyd, S., and L. Vandenberghe. 2009.

*Convex Optimization*. 7th ed. Cambridge: Cambridge University Press.
Faraway, J. J. 2009.

*Linear Models with r*. Edited by F. Dominici, J. J. Faraway, M. Tanner, and J. Zidek. London: CRC press.
Hoerl, Arthur E., and Robert W. Kennard. 1976. “Ridge Regression Iterative Estimation of the Biasing Parameter.”

*Communications in Statistics - Theory and Methods*5 (1): 77–88. https://doi.org/10.1080/03610927608827333.
Hurwitz, J., and D. Kirsch. 2018.

*Machine Learning for Dummies*. New Jersey: John Wiley & Sons.
James, G., D. Witten, T. Hastie, and R. Tibshirani. 2013.

*An Introduction to Statistical Learning*. Edited by G. Casella, Fienberg S, and I. Olkin. New York: Springer.
Tibshiriani, R. 1996. “Regression Shrinkage and Selection via the Lasso.”

*Journal of the Royal Statistical Society: Series B (Methodological)*58 (1): 267–88.