4.4 Prediction
The standard error in the expected value of \(\hat{y}\) at some new set of predictors \(X_n\) is
\[SE(\mu_\hat{y}) = \sqrt{\hat{\sigma}^2 (X_n (X'X)^{-1} X_n')}.\]
The standard error increases the further \(X_n\) is from \(\bar{X}\). If \(X_n = \bar{X}\), the equation reduces to \(SE(\mu_\hat{y}) = \sigma / \sqrt{n}\). If \(n\) is large, or the predictor values are spread out, \(SE(\mu_\hat{y})\) will be relatively small. The \((1 - \alpha)\%\) confidence interval is \(\hat{y} \pm t_{\alpha / 2} SE(\mu_\hat{y})\).
The standard error in the predicted value of \(\hat{y}\) at some \(X_{new}\) is
\[SE(\hat{y}) = SE(\mu_\hat{y})^2 + \sqrt{\hat{\sigma}^2}.\]
Notice the standard error for a predicted value is always greater than the standard error of the expected value. The \((1 - \alpha)\%\) prediction interval is \(\hat{y} \pm t_{\alpha / 2} SE(\hat{y})\).
4.4.0.1 Example
What is the expected value of mpg
if the predictor values equal their mean values?
R performs this calucation with the predict()
function with parameter interval = confidence
.
m <-lm(mpg ~ ., data = d[,1:9])
X_new <- data.frame(Const = 1,
cyl = factor(round(mean(as.numeric(as.character(d$cyl))),0), levels = levels(d$cyl)),
disp = mean(d$disp),
hp = mean(d$hp),
drat = mean(d$drat),
wt = mean(d$wt),
qsec = mean(d$qsec),
vs = factor("S", levels = levels(d$vs)),
am = factor("manual", levels = levels(d$am)))
predict.lm(object = m,
newdata = X_new,
interval = "confidence")
## fit lwr upr
## 1 21 17 25
You can verify this by manually calculating \(SE(\mu_\hat{y}) = \sqrt{\hat{\sigma}^2 (X_{new} (X'X)^{-1} X_{new}')}\) using matrix algebra.
X2 <- lapply(data.frame(model.matrix(m)), mean) %>% unlist() %>% t()
X2[2] <- contr.poly(3)[2,1] # cyl linear
X2[3] <- contr.poly(3)[2,2] # cyl quadratic
X2[9] <- 1
X2[10] <- 1
y_exp <- sum(m$coefficients * as.numeric(X2))
se_y_exp <- as.numeric(sqrt(rse^2 *
X2 %*%
solve(t(X) %*% X) %*%
t(X2)))
t_crit <- qt(p = .05 / 2, df = n - k - 1, lower.tail = FALSE)
me <- t_crit * se_y_exp
cat("fit: ", round(y_exp, 6),
", 95% CI: (", round(y_exp - me, 6), ", ", round(y_exp + me, 6), ")")
## fit: 21 , 95% CI: ( 17 , 25 )
4.4.0.2 Example
What is the predicted value of mpg if the predictor values equal their mean values?
R performs this calucation with the predict()
with parameter interval = prediction
.
## fit lwr upr
## 1 21 15 28
se_y_hat <- sqrt(rse^2 + se_y_exp^2)
me <- t_crit * se_y_hat
cat("fit: ", round(y_exp, 6),
", 95% CI: (", round(y_exp - me, 6), ", ", round(y_exp + me, 6), ")")
## fit: 21 , 95% CI: ( 15 , 28 )