Chapter 9 Multiple Regression

9.1 Sports Car Example

We aren’t going to calculate the equations for multiple regression models in class (i.e. models that use more than one \(X\) variable), but we’ll look a bit at how to use them.

9.2 Two Continuous \(X\) Variables

We’ll revisit the AccordPrices data set, where we will try to predict the Price of a used Honda Accord based on both Mileage and Age. I used software to create some graphs and to find the equation. The residual plot (i.e. the fitted values \(\hat{Y}\) on the \(x\)-axis and the residuals \(e\) on the \(y\)-axis) flares out like a horn or funnel, rather than just being a random assortment of points. This indicates more variability as the fitted value \(\hat{Y}\) increases (this is when the residuals are bigger).

\[\hat{Price} = 21.738 - 0.791Age - 0.053Mileage\]

## (Intercept)         Age     Mileage 
## 21.73825110 -0.79138059 -0.05314751

If I have a used Honda Accord that is 5 years old with 60,000 miles, I can predict its Price using Age\(=5\) and Mileage\(=60\).

\[\hat{Y}=21.738-0.791(5)-0.053(60)=14.603\]

The predicted price is about $14,600. If the actual price I sold it for was $13,000, then my residual would be \[e=Y-\hat{Y}=13-14.6=-1.6\] indicating I got about $1600 less than predicted (bad for me, good for the buyer).

9.3 One Continuous \(X\) and One Categorical \(X\)

Suppose a middle-aged man is having a mid-life crisis (not me, I didn’t buy a sports car) and is going to buy himself either a Jaguar or a Porsche. Suppose we want to predict its Price, based on both the cars’ Mileage (as before) and what type of Car he is going to buy.

Let’s look at a scatterplot, where different colors and shapes have been used to distinguish between Jaguars and Porsches.

If I were to create two regression models, one to fit the Jaguars (the red circles) and another to fit the Porsches (the blue triangles), how do you think the two equations would be similar? How would they be different?

Below I’ve superimposed the regression lines for the two types of sports car on the plot.

If you guessed that the two lines would have different intercepts, which represents the Price when Mileage\(=0\), a new car, you were right.

If you also guessed that the two lines would have about the same slope, asthe blue and red lines are almost parallel, indicating both cars have a pretty similar rate of depreciation in Price as Mileage increases, you were right again.

Using statistical software, I can get a single equation to predict the price of a car, where CarPorsche\(=0\) represents a Jaguar and CarPorsche\(=1\) represents a Porsche.

## (Intercept)     Mileage  CarPorsche 
##   53.605550   -0.602977   17.958331

\[\hat{Price} = 53.606 - 0.603Mileage + 17.958CarPorsche\]

What does the positive coefficient \(17.598\) for Car represent? This is the difference in Price between a Jaguar and a Porsche when they have the same Mileage.

For a new car with Mileage=0 \[\hat{Y}_{Jaguar} = 53.606 - 0.603(0) + 17.958(0) = 53.606\] \[\hat{Y}_{Porsche} = 53.606 - 0.603(0) + 17.598(1) = 71.204\]