18.8 Chapter 15: Regression
The following questions apply to the auction dataset in the yarrr package. This dataset contains information about 1,000 ships sold at a pirate auction.
- The column jbb is the “Jack’s Blue Book” value of a ship. Create a regression object called
jbb.cannon.lmpredicting the JBB value of ships based on the number of cannons it has. Based on your result, how much value does each additional cannon bring to a ship?
library(yarrr)
# jbb.cannon.lm model
# DV = jbb, IV = cannons
jbb.cannon.lm <- lm(formula = jbb ~ cannons,
data = auction)
# Print jbb.cannon.lm coefficients
summary(jbb.cannon.lm)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1396 61 23 1.3e-94
## cannons 101 3 34 1.9e-169- Repeat your previous regression, but do two separate regressions: one on modern ships and one on classic ships. Is there relationship between cannons and JBB the same for both types of ships?
# jbb.cannon.modern.lm model
# DV = jbb, IV = cannons. Only include modern ships
jbb.cannon.modern.lm <- lm(formula = jbb ~ cannons,
data = subset(auction, style == "modern"))
# jbb.cannon.classic.lm model
# DV = jbb, IV = cannons. Only include classic ships
jbb.cannon.classic.lm <- lm(formula = jbb ~ cannons,
data = subset(auction, style == "classic"))
# Print jbb.cannon.modern.lm coefficients
summary(jbb.cannon.modern.lm)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1217 71.8 17 3.5e-51
## cannons 100 3.5 29 3.1e-107
# Print jbb.cannon.classic.lm coefficients
summary(jbb.cannon.classic.lm)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1537 75.9 20 5.9e-67
## cannons 104 3.7 28 2.0e-103- Is there a significant interaction between a ship’s style and its age on its JBB value? If so, how do you interpret the interaction?
# int.lm model
# DV = jbb, IV = interaction between style and age
int.lm <- lm(formula = jbb ~ style * age,
data = auction
)
# Print int.lm coefficients
summary(int.lm)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3414.2 79.20 43.11 6.0e-230
## stylemodern -15.7 111.74 -0.14 8.9e-01
## age 1.9 0.76 2.57 1.0e-02
## stylemodern:age -3.7 1.07 -3.43 6.2e-04- Create a regression object called predicting the JBB value of ships based on cannons, rooms, age, condition, color, and style. Which aspects of a ship significantly affect its JBB value?
# jbb.all.lm model
# DV = jbb, IV = everything (except price)]
jbb.all.lm <- lm(jbb ~ cannons + rooms + age + condition + color + style,
data = auction
)
# Print jbb.all.lm coefficients
summary(jbb.all.lm)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 134.4 52.9 2.54 1.1e-02
## cannons 100.7 1.6 64.92 0.0e+00
## rooms 50.5 1.6 30.80 1.2e-146
## age 1.1 0.2 5.58 3.1e-08
## condition 107.6 3.9 27.51 3.4e-124
## colorbrown 4.9 16.6 0.30 7.7e-01
## colorplum -29.8 31.3 -0.95 3.4e-01
## colorred 15.1 18.3 0.82 4.1e-01
## colorsalmon -19.4 20.7 -0.94 3.5e-01
## stylemodern -397.8 12.8 -30.98 6.7e-148- Create a regression object called predicting the actual selling value of ships based on cannons, rooms, age, condition, color, and style. Based on the results, does the JBB do a good job of capturing the effect of each variable on a ship’s selling price?
# price.all.lm model
# DV = price, IV = everything (except jbb)]
price.all.lm <- lm(price ~ cannons + rooms + age + condition + color + style,
data = auction
)
# Print price.all.lm coefficients
summary(price.all.lm)$coefficients
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 302.5 73.81 4.10 4.5e-05
## cannons 100.0 2.17 46.17 4.0e-249
## rooms 48.8 2.29 21.34 2.0e-83
## age 1.2 0.29 4.28 2.0e-05
## condition 104.1 5.46 19.05 3.4e-69
## colorbrown -119.2 23.19 -5.14 3.3e-07
## colorplum 15.6 43.74 0.36 7.2e-01
## colorred -603.6 25.59 -23.59 5.4e-98
## colorsalmon 70.4 28.97 2.43 1.5e-02
## stylemodern -419.2 17.93 -23.38 1.3e-96- Repeat your previous regression analysis, but instead of using the price as the dependent variable, use the binary variable price.gt.3500 indicating whether or not the ship had a selling price greater than 3500. Call the new regression object
price.all.blr. Make sure to use the appropriate regression function!!.
# Create new binary variable indicating whether
# a ship sold for more than 3500
auction$price.gt.3500 <- auction$price > 3500
# price.all.blr model
# DV = price.gt.3500, IV = everything (except jbb)
price.all.blr <- glm(price.gt.3500 ~ cannons + rooms + age + condition + color + style,
data = auction,
family = binomial # Logistic regression
)
# price.all.blr coefficients
summary(price.all.blr)$coefficients
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -19.7401 1.4240 -13.86 1.1e-43
## cannons 0.6251 0.0442 14.14 2.2e-45
## rooms 0.2688 0.0296 9.07 1.2e-19
## age 0.0097 0.0033 2.93 3.4e-03
## condition 0.6825 0.0745 9.16 5.4e-20
## colorbrown -0.8924 0.2549 -3.50 4.6e-04
## colorplum -0.1291 0.5090 -0.25 8.0e-01
## colorred -4.0764 0.4107 -9.93 3.2e-23
## colorsalmon 0.2479 0.3172 0.78 4.3e-01
## stylemodern -2.4037 0.2432 -9.88 4.9e-23- Using
price.all.lm, predict the selling price of the 3 new ships
| cannons | rooms | age | condition | color | style |
|---|---|---|---|---|---|
| 12 | 34 | 43 | 7 | black | classic |
| 8 | 26 | 54 | 3 | black | modern |
| 32 | 65 | 100 | 5 | red | modern |
# Create a dataframe with new ship data
new.ships <- data.frame(cannons = c(12, 8, 32),
rooms = c(34, 26, 65),
age = c(43, 54, 100),
condition = c(7, 3, 5),
color = c("black", "black", "red"),
style = c("classic", "modern", "modern"),
stringsAsFactors = FALSE)
# Predict new ship data based on price.all.lm model
predict(object = price.all.lm,
newdata = new.ships
)
## 1 2 3
## 3944 2331 6296- Using
price.all.blr, predict the probability that the three new ships will have a selling price greater than 3500.