2.5 Prediction

The forecast of \(Y\) from \(\mathbf{X}=\mathbf{x}\) (this is, \(X_1=x_1,\ldots,X_p=x_p\)) is approached in two different ways:

  1. Through the estimation of the conditional mean of \(Y\) given \(\mathbf{X}=\mathbf{x},\) \(\mathbb{E}[Y|\mathbf{X}=\mathbf{x}].\) This is a deterministic quantity, which equals \(\beta_0+\beta_1x_1+\cdots+\beta_{p}x_p.\)
  2. Through the prediction of the conditional response \(Y|\mathbf{X}=\mathbf{x}.\) This is a random variable distributed as \(\mathcal{N}(\beta_0+\beta_1x_1+\cdots+\beta_{p}x_p,\sigma^2).\)

There are similarities and differences in the prediction of the conditional mean \(\mathbb{E}[Y|\mathbf{X}=\mathbf{x}]\) and conditional response \(Y|\mathbf{X}=\mathbf{x},\) which we highlight next:

  • Similarities. The estimate is the same35, \(\hat y=\hat\beta_0+\hat\beta_1x_1+\cdots+\hat\beta_px_p.\) The CIs for both quantities are centered in \(\hat y.\)
  • Differences. \(\mathbb{E}[Y|\mathbf{X}=\mathbf{x}]\) is deterministic and \(Y|\mathbf{X}=\mathbf{x}\) is a random variable. The prediction of the latter is noisier, because it has to take into account the randomness of \(Y.\) Therefore, the variance is larger for the prediction of \(Y|\mathbf{X}=\mathbf{x}\) than for the prediction of \(\mathbb{E}[Y|\mathbf{X}=\mathbf{x}].\) This has a direct consequence on the length of the prediction intervals, which are longer for \(Y|\mathbf{X}=\mathbf{x}\) than for \(\mathbb{E}[Y|\mathbf{X}=\mathbf{x}].\)

The inspection of the CIs for the conditional mean and conditional response in the simple linear model offers great insight into the previous similarities and differences, and also on what components affect precisely the quality of the prediction:

  • The \(100(1-\alpha)\%\) CI for the conditional mean \(\beta_0+\beta_1x\) is

\[\begin{align} \left(\hat y \pm t_{n-2;\alpha/2}\sqrt{\frac{\hat\sigma^2}{n}\left(1+\frac{(x-\bar x)^2}{s_x^2}\right)}\right).\tag{2.18} \end{align}\]

  • The \(100(1-\alpha)\%\) CI for the conditional response \(Y|X=x\) is

\[\begin{align} \left(\hat y \pm t_{n-2;\alpha/2}\sqrt{\hat\sigma^2+\frac{\hat\sigma^2}{n}\left(1+\frac{(x-\bar x)^2}{s_x^2}\right)}\right).\tag{2.19} \end{align}\]

Figure 2.15: Illustration of the CIs for the conditional mean and response. Note how the width of the CIs is influenced by \(x,\) especially for the conditional mean (the conditional response has a constant term affecting the width). Application available here.

Notice the dependence of both CIs on \(x,\) \(n,\) and \(\hat\sigma^2,\) each of them with a clear effect on the resulting length of the interval. Note also the high similarity between (2.18) and (2.19) (both intervals are centered at \(\hat y\) and have a similar variance) and its revealing unique difference: the extra \(\hat\sigma^2\) in (2.19)36, consequence of the “extra randomness” of the conditional response with respect to the conditional mean. Figure 2.15 helps to visualize these concepts and the difference between CIs interactively.

2.5.1 Case study application

The prediction and the computation of prediction CIs can be done with predict. The objects required for predict are: first, an lm object; second, a data.frame containing the locations \(\mathbf{x}=(x_1,\ldots,x_p)\) where we want to predict \(\beta_0+\beta_1x_1+\cdots+\beta_{p}x_p.\) The prediction is \(\hat\beta_0+\hat\beta_1x_1+\cdots+\hat\beta_{p}x_p\) and the CIs returned are either (2.18) or (2.19).

It is mandatory to name the columns of the data.frame with the same names of the predictors used in lm. Otherwise predict will throw an error.

# Fit a linear model for the price on WinterRain, HarvestRain, and AGST
modWine4 <- lm(Price ~ WinterRain + HarvestRain + AGST, data = wine)
summary(modWine4)
## 
## Call:
## lm(formula = Price ~ WinterRain + HarvestRain + AGST, data = wine)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.62816 -0.17923  0.02274  0.21990  0.62859 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -4.9506001  1.9694011  -2.514  0.01940 *  
## WinterRain   0.0012820  0.0005765   2.224  0.03628 *  
## HarvestRain -0.0036242  0.0009646  -3.757  0.00103 ** 
## AGST         0.7123192  0.1087676   6.549 1.11e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3436 on 23 degrees of freedom
## Multiple R-squared:  0.7407, Adjusted R-squared:  0.7069 
## F-statistic:  21.9 on 3 and 23 DF,  p-value: 6.246e-07

# Data for which we want a prediction
# Important! You have to name the column with the predictor's name!
weather <- data.frame(WinterRain = 500, HarvestRain = 123, AGST = 18)
weatherBad <- data.frame(500, 123, 18)

# Prediction of the mean

# Prediction of the mean at 95% -- the defaults
predict(modWine4, newdata = weather)
##        1 
## 8.066342
predict(modWine4, newdata = weatherBad) # Error
## Error in eval(predvars, data, env): object 'WinterRain' not found

# Prediction of the mean with 95% confidence interval (the default)
# CI: (lwr, upr)
predict(modWine4, newdata = weather, interval = "confidence")
##        fit      lwr      upr
## 1 8.066342 7.714178 8.418507
predict(modWine4, newdata = weather, interval = "confidence", level = 0.95)
##        fit      lwr      upr
## 1 8.066342 7.714178 8.418507

# Other levels
predict(modWine4, newdata = weather, interval = "confidence", level = 0.90)
##        fit      lwr      upr
## 1 8.066342 7.774576 8.358108
predict(modWine4, newdata = weather, interval = "confidence", level = 0.99)
##        fit      lwr      upr
## 1 8.066342 7.588427 8.544258

# Prediction of the response

# Prediction of the mean at 95% -- the defaults
predict(modWine4, newdata = weather)
##        1 
## 8.066342

# Prediction of the response with 95% confidence interval
# CI: (lwr, upr)
predict(modWine4, newdata = weather, interval = "prediction")
##        fit      lwr      upr
## 1 8.066342 7.273176 8.859508
predict(modWine4, newdata = weather, interval = "prediction", level = 0.95)
##        fit      lwr      upr
## 1 8.066342 7.273176 8.859508

# Other levels
predict(modWine4, newdata = weather, interval = "prediction", level = 0.90)
##        fit      lwr      upr
## 1 8.066342 7.409208 8.723476
predict(modWine4, newdata = weather, interval = "prediction", level = 0.99)
##        fit      lwr      upr
## 1 8.066342 6.989951 9.142733

# Predictions for several values
weather2 <- data.frame(WinterRain = c(500, 200), HarvestRain = c(123, 200),
                       AGST = c(17, 18))
predict(modWine4, newdata = weather2, interval = "prediction")
##        fit      lwr      upr
## 1 7.354023 6.613835 8.094211
## 2 7.402691 6.533945 8.271437

For the wine dataset, do the following:

  1. Regress WinterRain on HarvestRain and AGST. Name the fitted model modExercise.
  2. Compute the estimate for the conditional mean of WinterRain for HarvestRain \(=123.0\) and AGST \(=16.15.\) What is the CI at \(\alpha=0.01\)?
  3. Compute the estimate for the conditional response for HarvestRain \(=125.0\) and AGST \(=15.\) What is the CI at \(\alpha=0.10\)?
  4. Check that modExercise$fitted.values is the same as predict(modExercise, newdata = data.frame(HarvestRain = wine$HarvestRain, AGST = wine$AGST)). Why is this so?

  1. Because the prediction of a new observation from the random variable \(\mathcal{N}(\beta_0+\beta_1x_1+\cdots+\beta_{p}x_p,\sigma^2)\) is simply its mean, \(\beta_0+\beta_1x_1+\cdots+\beta_{p}x_p,\) which is also the most likely value in a normal.↩︎

  2. A consequence of this extra \(\hat\sigma^2\) is that the length of (2.19) cannot be reduced arbitrarily if the sample size \(n\) grows.↩︎