## 4.4 Prediction

The standard error in the expected value of $$\hat{y}$$ at some new set of predictors $$X_n$$ is

$SE(\mu_\hat{y}) = \sqrt{\hat{\sigma}^2 (X_n (X'X)^{-1} X_n')}.$

The standard error increases the further $$X_n$$ is from $$\bar{X}$$. If $$X_n = \bar{X}$$, the equation reduces to $$SE(\mu_\hat{y}) = \sigma / \sqrt{n}$$. If $$n$$ is large, or the predictor values are spread out, $$SE(\mu_\hat{y})$$ will be relatively small. The $$(1 - \alpha)\%$$ confidence interval is $$\hat{y} \pm t_{\alpha / 2} SE(\mu_\hat{y})$$.

The standard error in the predicted value of $$\hat{y}$$ at some $$X_{new}$$ is

$SE(\hat{y}) = SE(\mu_\hat{y})^2 + \sqrt{\hat{\sigma}^2}.$

Notice the standard error for a predicted value is always greater than the standard error of the expected value. The $$(1 - \alpha)\%$$ prediction interval is $$\hat{y} \pm t_{\alpha / 2} SE(\hat{y})$$.

#### 4.4.0.1 Example

What is the expected value of mpg if the predictor values equal their mean values?

R performs this calucation with the predict() function with parameter interval = confidence.

m <-lm(mpg ~ ., data = d[,1:9])
X_new <- data.frame(Const = 1,
cyl = factor(round(mean(as.numeric(as.character(d$cyl))),0), levels = levels(d$cyl)),
disp = mean(d$disp), hp = mean(d$hp),
drat = mean(d$drat), wt = mean(d$wt),
qsec = mean(d$qsec), vs = factor("S", levels = levels(d$vs)),
am = factor("manual", levels = levels(d$am))) predict.lm(object = m, newdata = X_new, interval = "confidence") ## fit lwr upr ## 1 21 17 25 You can verify this by manually calculating $$SE(\mu_\hat{y}) = \sqrt{\hat{\sigma}^2 (X_{new} (X'X)^{-1} X_{new}')}$$ using matrix algebra. X2 <- lapply(data.frame(model.matrix(m)), mean) %>% unlist() %>% t() X2 <- contr.poly(3)[2,1] # cyl linear X2 <- contr.poly(3)[2,2] # cyl quadratic X2 <- 1 X2 <- 1 y_exp <- sum(m$coefficients * as.numeric(X2))
se_y_exp <- as.numeric(sqrt(rse^2 *
X2 %*%
solve(t(X) %*% X) %*%
t(X2)))

t_crit <- qt(p = .05 / 2, df = n - k - 1, lower.tail = FALSE)
me <- t_crit * se_y_exp
cat("fit: ", round(y_exp, 6),
", 95% CI: (", round(y_exp - me, 6), ", ", round(y_exp + me, 6), ")")
## fit:  21 , 95% CI: ( 17 ,  25 )

#### 4.4.0.2 Example

What is the predicted value of mpg if the predictor values equal their mean values?

R performs this calucation with the predict() with parameter interval = prediction.

predict.lm(object = m,
newdata = X_new,
interval = "prediction")
##   fit lwr upr
## 1  21  15  28
se_y_hat <- sqrt(rse^2 + se_y_exp^2)
me <- t_crit * se_y_hat
cat("fit: ", round(y_exp, 6),
", 95% CI: (", round(y_exp - me, 6), ", ", round(y_exp + me, 6), ")")
## fit:  21 , 95% CI: ( 15 ,  28 )