6.17 LR in R: Predicting Recidvism (1)
- Logistic regression (LR) models the probability that \(Y\) belongs to a particular category (0 or 1)
- Rather than modeling response \(Y\) directly
- COMPAS data: Model probability to recidivate (reoffend)
- Outcome \(y\): Recidivism
is_recid(0,1,0,0,1,1,...) - Predictors \(x's\): age =
age, prior offenses =priors_count - Predicted values \(\hat{y}\): Pr(
is_recid=Yes|age)- Values of Pr(
is_recid=Yes|age) (abbr. p(age)) will range between 0 and 1 - For given value of
age(and other covariates in the model), a prediction can be made for outcomeis_recid
- Values of Pr(
- Outcome \(y\): Recidivism
- We can convert our predicted value (= a probability) to a 0/1 variable
- e.g., individuals will recidivate (
is_recid = Yes) if Pr(is_recid=Yes|age) > 0.5 (p(age) > 0.5) - More conservative: Use lower threshold, e.g., individuals will recidivate (
is_recid = Yes) if Pr(is_recid=Yes|age) > 0.1
- e.g., individuals will recidivate (
- Source: James et al. (2013 Chap. 4.3)
References
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.