2.5 Prediction
The forecast of \(Y\) from \(\mathbf{X}=\mathbf{x}\) (this is, \(X_1=x_1,\ldots,X_p=x_p\)) is approached in two different ways:
- Through the estimation of the conditional mean of \(Y\) given \(\mathbf{X}=\mathbf{x},\) \(\mathbb{E}[Y|\mathbf{X}=\mathbf{x}].\) This is a deterministic quantity, which equals \(\beta_0+\beta_1x_1+\cdots+\beta_{p}x_p.\)
- Through the prediction of the conditional response \(Y|\mathbf{X}=\mathbf{x}.\) This is a random variable distributed as \(\mathcal{N}(\beta_0+\beta_1x_1+\cdots+\beta_{p}x_p,\sigma^2).\)
There are similarities and differences in the prediction of the conditional mean \(\mathbb{E}[Y|\mathbf{X}=\mathbf{x}]\) and conditional response \(Y|\mathbf{X}=\mathbf{x},\) which we highlight next:
- Similarities. The estimate is the same35, \(\hat y=\hat\beta_0+\hat\beta_1x_1+\cdots+\hat\beta_px_p.\) The CIs for both quantities are centered in \(\hat y.\)
- Differences. \(\mathbb{E}[Y|\mathbf{X}=\mathbf{x}]\) is deterministic and \(Y|\mathbf{X}=\mathbf{x}\) is a random variable. The prediction of the latter is noisier, because it has to take into account the randomness of \(Y.\) Therefore, the variance is larger for the prediction of \(Y|\mathbf{X}=\mathbf{x}\) than for the prediction of \(\mathbb{E}[Y|\mathbf{X}=\mathbf{x}].\) This has a direct consequence on the length of the prediction intervals, which are longer for \(Y|\mathbf{X}=\mathbf{x}\) than for \(\mathbb{E}[Y|\mathbf{X}=\mathbf{x}].\)
The inspection of the CIs for the conditional mean and conditional response in the simple linear model offers great insight into the previous similarities and differences, and also on what components affect precisely the quality of the prediction:
- The \(100(1-\alpha)\%\) CI for the conditional mean \(\beta_0+\beta_1x\) is
\[\begin{align} \left(\hat y \pm t_{n-2;\alpha/2}\sqrt{\frac{\hat\sigma^2}{n}\left(1+\frac{(x-\bar x)^2}{s_x^2}\right)}\right).\tag{2.18} \end{align}\]
- The \(100(1-\alpha)\%\) CI for the conditional response \(Y|X=x\) is
\[\begin{align} \left(\hat y \pm t_{n-2;\alpha/2}\sqrt{\hat\sigma^2+\frac{\hat\sigma^2}{n}\left(1+\frac{(x-\bar x)^2}{s_x^2}\right)}\right).\tag{2.19} \end{align}\]
Notice the dependence of both CIs on \(x,\) \(n,\) and \(\hat\sigma^2,\) each of them with a clear effect on the resulting length of the interval. Note also the high similarity between (2.18) and (2.19) (both intervals are centered at \(\hat y\) and have a similar variance) and its revealing unique difference: the extra \(\hat\sigma^2\) in (2.19)36, consequence of the “extra randomness” of the conditional response with respect to the conditional mean. Figure 2.15 helps to visualize these concepts and the difference between CIs interactively.
2.5.1 Case study application
The prediction and the computation of prediction CIs can be done with predict
. The objects required for predict
are: first, an lm
object; second, a data.frame
containing the locations \(\mathbf{x}=(x_1,\ldots,x_p)\) where we want to predict \(\beta_0+\beta_1x_1+\cdots+\beta_{p}x_p.\) The prediction is \(\hat\beta_0+\hat\beta_1x_1+\cdots+\hat\beta_{p}x_p\) and the CIs returned are either (2.18) or (2.19).
It is mandatory to name the columns of the data.frame
with the same names of the predictors used in
lm
. Otherwise predict
will throw an error.
# Fit a linear model for the price on WinterRain, HarvestRain, and AGST
modWine4 <- lm(Price ~ WinterRain + HarvestRain + AGST, data = wine)
summary(modWine4)
##
## Call:
## lm(formula = Price ~ WinterRain + HarvestRain + AGST, data = wine)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.62816 -0.17923 0.02274 0.21990 0.62859
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.9506001 1.9694011 -2.514 0.01940 *
## WinterRain 0.0012820 0.0005765 2.224 0.03628 *
## HarvestRain -0.0036242 0.0009646 -3.757 0.00103 **
## AGST 0.7123192 0.1087676 6.549 1.11e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3436 on 23 degrees of freedom
## Multiple R-squared: 0.7407, Adjusted R-squared: 0.7069
## F-statistic: 21.9 on 3 and 23 DF, p-value: 6.246e-07
# Data for which we want a prediction
# Important! You have to name the column with the predictor's name!
weather <- data.frame(WinterRain = 500, HarvestRain = 123, AGST = 18)
weatherBad <- data.frame(500, 123, 18)
# Prediction of the mean
# Prediction of the mean at 95% -- the defaults
predict(modWine4, newdata = weather)
## 1
## 8.066342
predict(modWine4, newdata = weatherBad) # Error
## Error in eval(predvars, data, env): object 'WinterRain' not found
# Prediction of the mean with 95% confidence interval (the default)
# CI: (lwr, upr)
predict(modWine4, newdata = weather, interval = "confidence")
## fit lwr upr
## 1 8.066342 7.714178 8.418507
predict(modWine4, newdata = weather, interval = "confidence", level = 0.95)
## fit lwr upr
## 1 8.066342 7.714178 8.418507
# Other levels
predict(modWine4, newdata = weather, interval = "confidence", level = 0.90)
## fit lwr upr
## 1 8.066342 7.774576 8.358108
predict(modWine4, newdata = weather, interval = "confidence", level = 0.99)
## fit lwr upr
## 1 8.066342 7.588427 8.544258
# Prediction of the response
# Prediction of the mean at 95% -- the defaults
predict(modWine4, newdata = weather)
## 1
## 8.066342
# Prediction of the response with 95% confidence interval
# CI: (lwr, upr)
predict(modWine4, newdata = weather, interval = "prediction")
## fit lwr upr
## 1 8.066342 7.273176 8.859508
predict(modWine4, newdata = weather, interval = "prediction", level = 0.95)
## fit lwr upr
## 1 8.066342 7.273176 8.859508
# Other levels
predict(modWine4, newdata = weather, interval = "prediction", level = 0.90)
## fit lwr upr
## 1 8.066342 7.409208 8.723476
predict(modWine4, newdata = weather, interval = "prediction", level = 0.99)
## fit lwr upr
## 1 8.066342 6.989951 9.142733
# Predictions for several values
weather2 <- data.frame(WinterRain = c(500, 200), HarvestRain = c(123, 200),
AGST = c(17, 18))
predict(modWine4, newdata = weather2, interval = "prediction")
## fit lwr upr
## 1 7.354023 6.613835 8.094211
## 2 7.402691 6.533945 8.271437
For the wine
dataset, do the following:
- Regress
WinterRain
onHarvestRain
andAGST
. Name the fitted modelmodExercise
. - Compute the estimate for the conditional mean of
WinterRain
forHarvestRain
\(=123.0\) andAGST
\(=16.15.\) What is the CI at \(\alpha=0.01\)? - Compute the estimate for the conditional response for
HarvestRain
\(=125.0\) andAGST
\(=15.\) What is the CI at \(\alpha=0.10\)? - Check that
modExercise$fitted.values
is the same aspredict(modExercise, newdata = data.frame(HarvestRain = wine$HarvestRain, AGST = wine$AGST))
. Why is this so?
Because the prediction of a new observation from the random variable \(\mathcal{N}(\beta_0+\beta_1x_1+\cdots+\beta_{p}x_p,\sigma^2)\) is simply its mean, \(\beta_0+\beta_1x_1+\cdots+\beta_{p}x_p,\) which is also the most likely value in a normal.↩︎
A consequence of this extra \(\hat\sigma^2\) is that the length of (2.19) cannot be reduced arbitrarily if the sample size \(n\) grows.↩︎