6.18 The Logistic Model: Predictions

  • Estimation in R:
    • R output shows log odds: e.g., a one-unit increase in age is associated with an increase in the log odds of is_recid by -0.05 units (annoying interpretation)
## 
## Call:
## glm(formula = as.factor(is_recid) ~ age + priors_count, family = binomial, 
##     data = data.train)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -3.0872  -1.0680  -0.5689   1.0953   2.6065  
## 
## Coefficients:
##               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)   1.101001   0.097597   11.28   <2e-16 ***
## age          -0.049831   0.002861  -17.42   <2e-16 ***
## priors_count  0.159982   0.008236   19.43   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 6921.3  on 4999  degrees of freedom
## Residual deviance: 6186.8  on 4997  degrees of freedom
## AIC: 6192.8
## 
## Number of Fisher Scoring iterations: 4
  • Prediction in R:
    • Once coefficients have been estimated, it is a simple matter to compute the probability of outcome for values of our predictors (James et al. 2013, 134)
    • predict(): can be used to predict probability that person will recidivate, given values of the predictors
    • type="response": tell R to output probabilities of form P(Y=1|X) (as opposed to other information such as the logit)
age priors_count Pr
30 2 0.4815163
30 4 0.5611905
50 2 0.2552909
  • Q: How would you interpret these values?
  • Source: James et al. (2013 Chap. 4.3.3, 4.6.2)

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.