## 6.18 LR in R: Predicting Recidvism (2)

• Estimate model: glm(y ~ x1 + x2, family = binomial, data = data.train)
fit <- glm(as.factor(is_recid) ~ age + priors_count,
family = binomial,
data = data.train)
cat(paste(capture.output(summary(fit))[11:14], collapse="\n"))
##               Estimate Std. Error z value Pr(>|z|)
## (Intercept)   1.101001   0.097597   11.28   <2e-16 ***
## age          -0.049831   0.002861  -17.42   <2e-16 ***
## priors_count  0.159982   0.008236   19.43   <2e-16 ***
• R output shows log odds: e.g., a one-unit increase in age is associated with an increase in the log odds of is_recid by -0.05 units (annoying interpretation)

• predict(): Predict values

• Once coefficients have been estimated, it is a simple matter to compute the probability of outcome for values of our predictors (James et al. 2013, 134)
• predict(fit, newdata = NULL, type = "response"): Predict probability for each unit
• predict(fit, newdata = data_predict, type = "response"): Predict probability for particular Xs (contained in data_predict)
• type="response": Output probabilities of form $$P(Y=1|X)$$ (as opposed to other information such as the logit)
data_predict = data.frame(age = c(30, 30, 50),
priors_count = c(2, 4, 2))
data_predict\$Pr <- predict(fit, newdata = data_predict, type = "response")
data_predict
age priors_count Pr
30 2 0.4815163
30 4 0.5611905
50 2 0.2552909
• Q: How would you interpret these values?
• Source: James et al. (2013 Chap. 4.3.3, 4.6.2)

### References

James, Gareth, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning: With Applications in R. Springer Texts in Statistics. Springer.