Chapter 5 The Normal Error Linear Regression Model
5.1 Section 5.1 pt()
and q(t)
If we want to calculate a t-statistic ourselves, we can divide the estimate, minus the associated hypothesized parameter value, by its standard error, and then using the pt()
to obtain the p-value. We need to multiply by 2 to get values as extreme as ours in either tail of the t-distribution. Also enter the degrees of freedom associated with the residual standard error in the R output.
5.1.1 Calculating t-Statistic
b1 <- summary(Cars_M_A060)$coeff[2,1]
SEb1 <- summary(Cars_M_A060)$coeff[2,2]
ts <- (b1-0)/SEb1
2*pt(-abs(ts), df=108)
## [1] 1.485428e-20
If we want to calculate confidence intervals with a level of confidence other than 95%, we can get the appropriate multiplier using the qt()
command. For a A% interval, use an argument of (1-A/100)/2 to get the appropriate multiplier.
5.1.2 Obtaining p-value using qt()
abs(qt((1-.99)/2, df=108))
## [1] 2.62212
5.2 Confidence and Prediction Intervals
To calculate a 95% confidence interval for the average response, given a specified value of the explanatory variable, use the predict()
command and specify interval="confidence"
.
To calculate a 95% prediction interval for a new observation, given a specified value of the explanatory variable, use the predict()
command and specify interval="predict"
.
5.2.1 Confidence and Prediction Interval Example:
95% confidence interval for mean price among all cars that can accelerate from 0 to 60 mph in 7 seconds.
predict(Cars_M_A060, newdata=data.frame(Acc060=7), interval="confidence", level=0.95)
## fit lwr upr
## 1 39.5502 37.21856 41.88184
95% prediction interval for price of an individual car that can accelerate from 0 to 60 mph in 7 seconds.
predict(Cars_M_A060, newdata=data.frame(Acc060=7), interval="prediction", level=0.95)
## fit lwr upr
## 1 39.5502 18.19826 60.90215
5.3 Plots for Checking Model Assumptions
To check model assumptions, we use a plot of residuals vs predicted values, a histogram of the residuals, and a normal quantile-quantile plot. Code to produce each of these is given below.
To put the graphs side-by-side, we use the grid.arrange()
function in the gridExtra()
package.
5.3.1 Residual Plot, Histogram, and Normal QQ Plot
library(gridExtra)
P1 <- ggplot(data=data.frame(Cars_M_A060$residuals),
aes(y=Cars_M_A060$residuals, x=Cars_M_A060$fitted.values)) +
geom_point() + ggtitle("Cars Model Residual Plot") +
xlab("Predicted Values") + ylab("Residuals")
P2 <- ggplot(data=data.frame(Cars_M_A060$residuals),
aes(x=Cars_M_A060$residuals)) + geom_histogram() +
ggtitle("Histogram of Residuals") + xlab("Residual")
P3 <- ggplot(data=data.frame(Cars_M_A060$residuals),
aes(sample = scale(Cars_M_A060$residuals))) +
stat_qq() + stat_qq_line() + xlab("Normal Quantiles") +
ylab("Residual Quantiles") + ggtitle("Cars Model QQ Plot")
grid.arrange(P1, P2, P3, ncol=3)
5.4 Section 5.4: Transformations
When residual plots reveal issues with model assumptions, we can try modeling a transformation of the response variable, such as log()
.
5.4.1 Model with Transformation
Cars_M_Log <- lm(data=Cars2015, log(LowPrice)~Acc060)
summary(Cars_M_Log)
##
## Call:
## lm(formula = log(LowPrice) ~ Acc060, data = Cars2015)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.84587 -0.19396 0.00908 0.18615 0.53350
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.13682 0.13021 39.45 <2e-16 ***
## Acc060 -0.22064 0.01607 -13.73 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.276 on 108 degrees of freedom
## Multiple R-squared: 0.6359, Adjusted R-squared: 0.6325
## F-statistic: 188.6 on 1 and 108 DF, p-value: < 2.2e-16
When making predictions, we need to exponentiate.
exp(predict(Cars_M_Log, newdata=data.frame(Acc060=c(7)), interval="prediction", level=0.95))
## fit lwr upr
## 1 36.31908 20.94796 62.96917
Likewise, we are interested in \(e^{b_j}\).
exp(confint(Cars_M_Log))
## 2.5 % 97.5 %
## (Intercept) 131.4610408 220.284693
## Acc060 0.7768669 0.827959