6.8 Prediction

As with MLR, predict() computes predicted values from the model. The default output will be on the log-odds scale, but adding type = "response" will instead give a predicted probability. Unfortunately, predict() does not have a CI option for glm() models. Instead, use gmodels::estimable() to obtain the prediction along with its 95% CI.

Example 6.3 (continued): What is the predicted probability (and 95% CI) of lifetime marijuana use for someone who first used alcohol at age 13 years, is currently age 30 years, male, and has an income of < $20,000? What if they first used alcohol at age 21 years?

The following demonstrates the use of predict() to compute these predicted probabilities.

predict(fit.ex6.3.adj,
        newdata = data.frame(alc_agefirst   = c(13, 21),
                             demog_age_cat6 = "26-34",
                             demog_sex      = "Male",
                             demog_income   = "Less than $20,000"),
        type = "response")
##      1      2 
## 0.9103 0.5283

However, if you also want a CI, then use gmodels::estimable(). The code is slightly more complicated than using predict(). First, predict() automatically includes the intercept in its calculation but since gmodels::estimable() is also used for estimates that are not predictions you have to explicitly tell it you want to include the intercept. Second, you must enter the names of the terms as spelled in the coefficient output rather than the variable names in the model (these are different for factor predictors). Third, you must use the inverse logit function to convert the result to a probability. Lastly, unlike predict, where we included two values for alc_agefirst to get two predictions, if you want multiple predictions you must run gmodels::estimable() multiple times.

First, examine the rownames() of the coefficient output to make sure you spell each term correctly.

rownames(summary(fit.ex6.3.adj)$coef)
##  [1] "(Intercept)"                   "alc_agefirst"                 
##  [3] "demog_age_cat626-34"           "demog_age_cat635-49"          
##  [5] "demog_age_cat650-64"           "demog_age_cat665+"            
##  [7] "demog_sexMale"                 "demog_income$20,000 - $49,999"
##  [9] "demog_income$50,000 - $74,999" "demog_income$75,000 or more"

Then enter the needed terms and their multiplicative factors in gmodels::estimable(), surrounded by ilogit(). Note that we only extract the estimate and the CI, not the other parts of the gmodels::estimable() output as the other parts would not make sense after applying ilogit().

# Always include the intercept for prediction
# Specify a 1 for the intercept, a # for each continuous predictor
# and a 1 for each non-reference level of a categorical variable.
# If a predictor is at its reference level, it should be left out.
ilogit(gmodels::estimable(fit.ex6.3.adj,
  c("(Intercept)"         = 1,
    "alc_agefirst"        = 13,
    "demog_age_cat626-34" = 1,
    "demog_sexMale"       = 1),
  conf.int = 0.95))[c("Estimate", "Lower.CI", "Upper.CI")]
##                        Estimate Lower.CI Upper.CI
## (1 13 1 0 0 0 1 0 0 0)   0.9103   0.8387   0.9519
ilogit(gmodels::estimable(fit.ex6.3.adj,
  c("(Intercept)"         = 1,
    "alc_agefirst"        = 21,
    "demog_age_cat626-34" = 1,
    "demog_sexMale"       = 1),
  conf.int = 0.95))[c("Estimate", "Lower.CI", "Upper.CI")]
##                        Estimate Lower.CI Upper.CI
## (1 21 1 0 0 0 1 0 0 0)   0.5283   0.3724   0.6789

Conclusion: We predict that among individuals who first used alcohol at age 13 years, are age 30 years, male, and have an income less than $20,000 per year, 91.0% have used marijuana (95% CI = 83.9%, 95.2%). Among those who instead waited until age 21 years to first use alcohol, the proportion drops to 52.8% (95% CI = 37.2%, 67.9%).