Chapter 3 Exercises

3.1 Exercise 1 - Cars

Context: A sample of 86 cars had their city fuel economy measured in miles per gallon and for each car the size of the engine, horsepower (an indication of how powerful the engine is) and the length of the wheelbase (distance between the centres of the front and rear wheels) were also recorded. It is of interest to model city fuel economy using the engine size, horsepower and wheelbase as predictors.

Data: cars.csv

Read in the data using:

cars <- read.csv("cars.csv")
  1. Use an appropriate exploratory analysis to explore the relationships between cmpg, engine size, horse power and wheel base. Is there anything that might concern you here?

The pairs() function may help in performing an exploratory analysis.

The relationships appear to be fairly , with the exception of the relationships between and the other variables. This may potentially be fixed by performing a transformation.

  1. Fit a multiple linear regression model to the data in order to predict cmpg from engine size, horse power and wheel base.

Fit a multiple linear regression model and then test to see if removing any of the variables would improve the model.

The best fitting model is one using
  1. Examine the assumptions of the selected model and comment on the model fit.

You can use plot() on the model you created to get a series of diagnostic plots that help with examining model assumptions.

The residuals and they .

The residuals in the normal Q-Q plot to the line.

The histogram of the model residuals bell-shaped.

  1. Using an appropriate regression model, find and interpret a 95% confidence interval for the logarithm of city fuel economy of a future car with:
    Engine Size = 2, HorsePower = 65, Wheelbase = 95

The function predict() in R may come in useful.

The lower end of the interval is (to 4 decimal places).

The upper end of the interval is (to 4 decimal places).

model.cars <- lm(log(cmpg) ~ ., data = cars)

predict(model.cars, newdata = data.frame(EngSize=2, HorsePow=65, Wheelbase=95), interval = "confidence")
##        fit      lwr      upr
## 1 3.284275 3.229901 3.338648
  1. Using the same regression model, find and interpret a 95% prediction interval for the logarithm of city fuel economy of a future car with:
    Engine Size = 2, HorsePower = 65, Wheelbase = 95

The function predict() in R with the argument interval = "predict" may come in useful.

The lower end of the interval is (to 4 decimal places).

The upper end of the interval is (to 4 decimal places).

model.cars <- lm(log(cmpg) ~ ., data = cars)

predict(model.cars, newdata = data.frame(EngSize=2, HorsePow=65, Wheelbase=95), interval = "predict")
##        fit      lwr      upr
## 1 3.284275 3.083691 3.484858