8.2 LR in R: Predicting Recidvism (1)
- Logistic regression (LR) models the probability that \(Y\) belongs to a particular category (0 or 1)
- Rather than modeling response \(Y\) directly
- COMPAS data: Model probability to recidivate (reoffend)
- Outcome \(y\): Recidivism
is_recid
(0,1,0,0,1,1,...
) - Various predictors \(x's\)
- age =
age
- prior offenses =
priors_count
- age =
- Outcome \(y\): Recidivism
- Use LR to obtain predicted values \(\hat{y}\) + As probabilities predicted values will range between 0 and 1 + Depend on input/features (e.g., age, prior offences)
- Convert predicted values (probabilities) to a binary variable
- e.g., individuals will recidivate (
is_recid = Yes
) if Pr(is_recid
=Yes
|age
) > 0.5 (p(age
) > 0.5) - Here we call this variable
classified
- More conservative: Use lower threshold, e.g., individuals will recidivate (
is_recid = Yes
) if Pr(is_recid
=Yes
|age
) > 0.1
- e.g., individuals will recidivate (
- Source: James et al. (2013, chap. 4.3)
References
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.