3.5 Prediction
As in the simple linear model, the forecast of \(Y\) from \(\mathbf{X}=\mathbf{x}\) (this is, \(X_1=x_1,\ldots,X_k=x_k\)) is approached by two different ways:
- Inference on the conditional mean of \(Y\) given \(\mathbf{X}=\mathbf{x}\), \(\mathbb{E}[Y|\mathbf{X}=\mathbf{x}]\). This is a deterministic quantity, which equals \(\beta_0+\beta_1x_1+\cdots+\beta_{k}x_k\).
- Prediction of the conditional response \(Y|\mathbf{X}=\mathbf{x}\). This is a random variable distributed as \(\mathcal{N}(\beta_0+\beta_1x_1+\cdots+\beta_{k}x_k,\sigma^2)\).
The prediction and computation of CIs can be done with the R function predict (unfortunately, there is no R Commander shortcut for this one). The objects required for predict are: first, the output of lm; second, a data.frame containing the locations \(\mathbf{x}=(x_1,\ldots,x_k)\) where we want to predict \(\beta_0+\beta_1x_1+\cdots+\beta_{k}x_k\). The prediction is \(\hat\beta_0+\hat\beta_1x_1+\cdots+\hat\beta_{k}x_k\)
It is mandatory to name the columns of the data frame with the same names of the predictors used in lm. Otherwise predict will generate an error, see below.
To illustrate the use of predict, we return to the wine dataset.
# Fit a linear model for the price on WinterRain, HarvestRain and AGST
modelW <- lm(Price ~ WinterRain + HarvestRain + AGST, data = wine)
summary(modelW)
##
## Call:
## lm(formula = Price ~ WinterRain + HarvestRain + AGST, data = wine)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.62816 -0.17923 0.02274 0.21990 0.62859
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -4.9506001 1.9694011 -2.514 0.01940 *
## WinterRain 0.0012820 0.0005765 2.224 0.03628 *
## HarvestRain -0.0036242 0.0009646 -3.757 0.00103 **
## AGST 0.7123192 0.1087676 6.549 1.11e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3436 on 23 degrees of freedom
## Multiple R-squared: 0.7407, Adjusted R-squared: 0.7069
## F-statistic: 21.9 on 3 and 23 DF, p-value: 6.246e-07
# Data for which we want a prediction
# Important! You have to name the column with the predictor name!
weather <- data.frame(WinterRain = 500, HarvestRain = 123,
AGST = 18)
## Prediction of the mean
# Prediction of the mean at 95% - the defaults
predict(modelW, newdata = weather)
## 1
## 8.066342
# Prediction of the mean with 95% confidence interval (the default)
# CI: (lwr, upr)
predict(modelW, newdata = weather, interval = "confidence")
## fit lwr upr
## 1 8.066342 7.714178 8.418507
predict(modelW, newdata = weather, interval = "confidence", level = 0.95)
## fit lwr upr
## 1 8.066342 7.714178 8.418507
# Other levels
predict(modelW, newdata = weather, interval = "confidence", level = 0.90)
## fit lwr upr
## 1 8.066342 7.774576 8.358108
predict(modelW, newdata = weather, interval = "confidence", level = 0.99)
## fit lwr upr
## 1 8.066342 7.588427 8.544258
## Prediction of the response
# Prediction of the mean at 95% - the defaults
predict(modelW, newdata = weather)
## 1
## 8.066342
# Prediction of the response with 95% confidence interval (the default)
# CI: (lwr, upr)
predict(modelW, newdata = weather, interval = "prediction")
## fit lwr upr
## 1 8.066342 7.273176 8.859508
predict(modelW, newdata = weather, interval = "prediction", level = 0.95)
## fit lwr upr
## 1 8.066342 7.273176 8.859508
# Other levels
predict(modelW, newdata = weather, interval = "prediction", level = 0.90)
## fit lwr upr
## 1 8.066342 7.409208 8.723476
predict(modelW, newdata = weather, interval = "prediction", level = 0.99)
## fit lwr upr
## 1 8.066342 6.989951 9.142733
# Predictions for several values
weather2 <- data.frame(WinterRain = c(500, 200), HarvestRain = c(123, 200),
AGST = c(17, 18))
predict(modelW, newdata = weather2, interval = "prediction")
## fit lwr upr
## 1 7.354023 6.613835 8.094211
## 2 7.402691 6.533945 8.271437For the wine dataset, do the following:
- Regress
WinterRainonHarvestRainandAGST. Name the fitted modelmodExercise. - Compute the estimate for the conditional mean of
WinterRainforHarvestRain\(=123.0\) andAGST\(=16.15\). What is the CI at \(\alpha=0.01\)? - Compute the estimate for the conditional response for
HarvestRain\(=125.0\) andAGST\(=15\). What is the CI at \(\alpha=0.10\)? - Check that
modExercise$fitted.valuesis the same aspredict(modExercise, newdata = data.frame(HarvestRain = wine$HarvestRain, AGST = wine$AGST)). Why is so?
Similarities and differences in the prediction of the conditional mean \(\mathbb{E}[Y|\mathbf{X}=\mathbf{x}]\) and conditional response \(Y|\mathbf{X}=\mathbf{x}\):
- Similarities. The estimate is the same, \(\hat y=\hat\beta_0+\hat\beta_1x_1+\cdots+\hat\beta_kx_k\). Both CI are centered in \(\hat y\).
- Differences. \(\mathbb{E}[Y|\mathbf{X}=\mathbf{x}]\) is deterministic and \(Y|\mathbf{X}=\mathbf{x}\) is random. Therefore, the variance is larger for the prediction of \(Y|\mathbf{X}=\mathbf{x}\) than for the prediction of \(\mathbb{E}[Y|\mathbf{X}=\mathbf{x}]\).