18.8 Chapter 15: Regression

The following questions apply to the auction dataset in the yarrr package. This dataset contains information about 1,000 ships sold at a pirate auction.

The column jbb is the “Jack’s Blue Book” value of a ship. Create a regression object called jbb.cannon.lm predicting the JBB value of ships based on the number of cannons it has. Based on your result, how much value does each additional cannon bring to a ship?

library(yarrr)

# jbb.cannon.lm model
# DV = jbb, IV = cannons
jbb.cannon.lm <- lm(formula = jbb ~ cannons, 
                    data = auction)

# Print jbb.cannon.lm coefficients
summary(jbb.cannon.lm)$coefficients
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)     1396         61      23  1.3e-94
## cannons          101          3      34 1.9e-169

Repeat your previous regression, but do two separate regressions: one on modern ships and one on classic ships. Is there relationship between cannons and JBB the same for both types of ships?

# jbb.cannon.modern.lm  model
# DV = jbb, IV = cannons. Only include modern ships
jbb.cannon.modern.lm <- lm(formula = jbb ~ cannons, 
                          data = subset(auction, style == "modern"))

# jbb.cannon.classic.lm model
# DV = jbb, IV = cannons. Only include classic ships
jbb.cannon.classic.lm <- lm(formula = jbb ~ cannons, 
                          data = subset(auction, style == "classic"))

# Print jbb.cannon.modern.lm coefficients
summary(jbb.cannon.modern.lm)$coefficients
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)     1217       71.8      17  3.5e-51
## cannons          100        3.5      29 3.1e-107


# Print jbb.cannon.classic.lm coefficients
summary(jbb.cannon.classic.lm)$coefficients
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)     1537       75.9      20  5.9e-67
## cannons          104        3.7      28 2.0e-103

Is there a significant interaction between a ship’s style and its age on its JBB value? If so, how do you interpret the interaction?

# int.lm model
# DV = jbb, IV = interaction between style and age
int.lm <- lm(formula = jbb ~ style * age,
             data = auction
             )

# Print int.lm coefficients
summary(int.lm)$coefficients
##                 Estimate Std. Error t value Pr(>|t|)
## (Intercept)       3414.2      79.20   43.11 6.0e-230
## stylemodern        -15.7     111.74   -0.14  8.9e-01
## age                  1.9       0.76    2.57  1.0e-02
## stylemodern:age     -3.7       1.07   -3.43  6.2e-04

Create a regression object called predicting the JBB value of ships based on cannons, rooms, age, condition, color, and style. Which aspects of a ship significantly affect its JBB value?

# jbb.all.lm model
# DV = jbb, IV = everything (except price)]
jbb.all.lm <- lm(jbb ~ cannons + rooms + age + condition + color + style,
                 data = auction
                 )

# Print jbb.all.lm coefficients
summary(jbb.all.lm)$coefficients
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    134.4       52.9    2.54  1.1e-02
## cannons        100.7        1.6   64.92  0.0e+00
## rooms           50.5        1.6   30.80 1.2e-146
## age              1.1        0.2    5.58  3.1e-08
## condition      107.6        3.9   27.51 3.4e-124
## colorbrown       4.9       16.6    0.30  7.7e-01
## colorplum      -29.8       31.3   -0.95  3.4e-01
## colorred        15.1       18.3    0.82  4.1e-01
## colorsalmon    -19.4       20.7   -0.94  3.5e-01
## stylemodern   -397.8       12.8  -30.98 6.7e-148

Create a regression object called predicting the actual selling value of ships based on cannons, rooms, age, condition, color, and style. Based on the results, does the JBB do a good job of capturing the effect of each variable on a ship’s selling price?

# price.all.lm model
# DV = price, IV = everything (except jbb)]
price.all.lm <- lm(price ~ cannons + rooms + age + condition + color + style,
                 data = auction
                 )

# Print price.all.lm coefficients
summary(price.all.lm)$coefficients
##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    302.5      73.81    4.10  4.5e-05
## cannons        100.0       2.17   46.17 4.0e-249
## rooms           48.8       2.29   21.34  2.0e-83
## age              1.2       0.29    4.28  2.0e-05
## condition      104.1       5.46   19.05  3.4e-69
## colorbrown    -119.2      23.19   -5.14  3.3e-07
## colorplum       15.6      43.74    0.36  7.2e-01
## colorred      -603.6      25.59  -23.59  5.4e-98
## colorsalmon     70.4      28.97    2.43  1.5e-02
## stylemodern   -419.2      17.93  -23.38  1.3e-96

Repeat your previous regression analysis, but instead of using the price as the dependent variable, use the binary variable price.gt.3500 indicating whether or not the ship had a selling price greater than 3500. Call the new regression object price.all.blr. Make sure to use the appropriate regression function!!.

# Create new binary variable indicating whether
#   a ship sold for more than 3500
auction$price.gt.3500 <- auction$price > 3500

# price.all.blr model
# DV = price.gt.3500, IV = everything (except jbb)
price.all.blr <- glm(price.gt.3500 ~ cannons + rooms + age + condition + color + style,
                 data = auction,
                 family = binomial   # Logistic regression
                 )

# price.all.blr coefficients
summary(price.all.blr)$coefficients
##             Estimate Std. Error z value Pr(>|z|)
## (Intercept) -19.7401     1.4240  -13.86  1.1e-43
## cannons       0.6251     0.0442   14.14  2.2e-45
## rooms         0.2688     0.0296    9.07  1.2e-19
## age           0.0097     0.0033    2.93  3.4e-03
## condition     0.6825     0.0745    9.16  5.4e-20
## colorbrown   -0.8924     0.2549   -3.50  4.6e-04
## colorplum    -0.1291     0.5090   -0.25  8.0e-01
## colorred     -4.0764     0.4107   -9.93  3.2e-23
## colorsalmon   0.2479     0.3172    0.78  4.3e-01
## stylemodern  -2.4037     0.2432   -9.88  4.9e-23

Using price.all.lm, predict the selling price of the 3 new ships

cannons	rooms	age	condition	color	style
12	34	43	7	black	classic
8	26	54	3	black	modern
32	65	100	5	red	modern

# Create a dataframe with new ship data
new.ships <- data.frame(cannons = c(12, 8, 32),
                  rooms = c(34, 26, 65),
                  age = c(43, 54, 100),
                  condition = c(7, 3, 5),
                  color = c("black", "black", "red"),
                  style = c("classic", "modern", "modern"),
                  stringsAsFactors = FALSE)

# Predict new ship data based on price.all.lm model
predict(object = price.all.lm,
        newdata = new.ships
        )
##    1    2    3 
## 3944 2331 6296

Using price.all.blr, predict the probability that the three new ships will have a selling price greater than 3500.

# Calculate logit of predictions
log.pred <- predict(object = price.all.blr,
                    newdata = new.ships
                    )

# Convert logits to probabilities
1 / (1 + exp(-log.pred))
##       1       2       3 
## 0.89038 0.00051 1.00000