6.18 LR in R: Predicting Recidvism (2)
- Estimate model: glm(y ~ x1 + x2, family = binomial, data = data.train)
fit <- glm(as.factor(is_recid) ~ age + priors_count,
family = binomial,
data = data.train)
cat(paste(capture.output(summary(fit))[11:14], collapse="\n"))
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 1.101001 0.097597 11.28 <2e-16 ***
## age -0.049831 0.002861 -17.42 <2e-16 ***
## priors_count 0.159982 0.008236 19.43 <2e-16 ***
R output shows log odds: e.g., a one-unit increase in
age
is associated with an increase in the log odds ofis_recid
by-0.05
units (annoying interpretation)predict()
: Predict values- Once coefficients have been estimated, it is a simple matter to compute the probability of outcome for values of our predictors (James et al. 2013, 134)
predict(fit, newdata = NULL, type = "response")
: Predict probability for each unitpredict(fit, newdata = data_predict, type = "response")
: Predict probability for particular Xs (contained indata_predict
)type="response"
: Output probabilities of form \(P(Y=1|X)\) (as opposed to other information such as the logit)
data_predict = data.frame(age = c(30, 30, 50),
priors_count = c(2, 4, 2))
data_predict$Pr <- predict(fit, newdata = data_predict, type = "response")
data_predict
age | priors_count | Pr |
---|---|---|
30 | 2 | 0.4815163 |
30 | 4 | 0.5611905 |
50 | 2 | 0.2552909 |
- Q: How would you interpret these values?
- Source: James et al. (2013 Chap. 4.3.3, 4.6.2)
References
James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.