6.17 Logistic Regression: Recidvism
- Outcome Y: Recidivism (yes = 1 vs. no = 0)
- Logistic regression models the probability that Y belongs to a particular category (0 or 1)
- Rather than modeling response Y directly
- COMPAS data: Model probability to recidivate (reoffend)
- Outcome: Recidivism
is_recid
(0,1,0,0,1,1,...
) - Predictors: age =
age
, prior offenses =priors_count
- Predicted values: Pr(
is_recid
=Yes
|age
)- Values of Pr(
is_recid
=Yes
|age
) (abbr. p(age
)) will range between 0 and 1
- Values of Pr(
- Outcome: Recidivism
- For given value of
age
, a prediction can be made for outcomeis_recid
- We can convert our predicted value (= a probability) to a 0/1 variable
- e.g., individuals will recidivate (
is_recid = Yes
) if Pr(is_recid
=Yes
|age
) > 0.5 (p(age
) > 0.5)
- More conservative:
- Use lower threshold, e.g., individuals will recidivate (
is_recid = Yes
) if Pr(is_recid
=Yes
|age
) > 0.1
- Use lower threshold, e.g., individuals will recidivate (
- Source: James et al. (2013 Chap. 4.3)
References
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.