8.1 The Logistic Model

Predicting recidivsm (0/1): How should we model the relationship between $p(X)=Pr(Y=1|X)$ and $X$ ?
- See Figure 4.2 in James et al. (2013, 131)
- Use either linear probability model or logistic regression
Linear probability model: $p(X)=\beta_{0}+\beta_{1}X$
- Linear predictions of our outcome (probabilities), can be out of [0,1] range
Logistic regression (uses logistic function): $p(X)=\frac{e^{\beta_{0}+\beta_{1}X}}{1+e^{\beta_{0}+\beta_{1}X}}$
- odds: $\frac{p(X)}{1-p(X)}=e^{\beta_{0}+\beta_{1}X}$ (range: $[0,\infty]$ , the higher, the higher probability of recidivism/default)
- log-odds/logit: $log\left(\frac{p(X)}{1-p(X)}\right) = \beta_{0}+\beta_{1}X$ (James et al. 2013, 132)
  - Increasing X by one unit, increases the log odds by $\beta_{1}$ (usually output in R)
Estimation of $\beta_{0}$ and $\beta_{1}$ usually relies on maximum likelihood
See James et al. (2013, chap. 4.3.4) for an overview
Source: James et al. (2013, chaps. 4.3.1, 4.3.2, 4.3.4)

References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.