5.10 Predictions

To get a prediction (again, technically, an estimate of the mean outcome), you could plug the predictor values into the regression equation, multiplying each by its corresponding regression coefficient, summing the products, and adding the intercept. However, it is much easier to use predict() to do the calculation for you, with the added benefit of also getting a 95% confidence interval for the mean.

An optional second argument to predict() is a data.frame containing predictor values. The predictor values must be in the same format as those used to fit the model. For a numeric predictor, specify a number (not in quotes). For a categorical predictor, specify a level (in quotes). To make sure you enter a legitimate factor level, use levels() to check the spelling.

NOTE: If any categorical predictor level is misspelled, predict() will return an error. However, specifying one or more continuous predictor values beyond the range observed in the data will, unfortunately, not return an error or even a warning. Therefore, when making predictions, be careful to only predict at values within the range of the data used to fit the model. For example, if the model were fit using data from those age 18 years and older, a prediction for 10-year-olds would be invalid (see Section 5.25).

Example 5.1 (continued): Estimate the mean fasting glucose and its 95% confidence interval at the following values of waist circumference, smoking status, age, gender, race/ethnicity, and income:

  • WC = 130 cm
  • Smoker = Current
  • Age = 50 years
  • Gender = Male
  • Race/ethnicity = Non-Hispanic Black
  • Income = $55,000+

First, check the spelling of the levels (results not shown).

levels(nhanesf.complete$smoker)
levels(nhanesf.complete$RIAGENDR)
levels(nhanesf.complete$race_eth)
levels(nhanesf.complete$income)

Next, use predict() with the appropriate data.frame as the second argument, and interval = "confidence".

# Use predict() with a data.frame with predictor levels
predict(fit.ex5.1, data.frame(
  BMXWAIST = 130,
  smoker   = "Current",
  RIDAGEYR = 50,
  RIAGENDR = "Male",
  race_eth = "Non-Hispanic Black",
  income   = "$55,000+"),
interval = "confidence")
##     fit   lwr   upr
## 1 7.168 6.711 7.625

Conclusion: The estimated mean fasting glucose among individuals with the specified predictor values is 7.17 mmol/L (95% CI = 6.71, 7.63).