Chapter 6 Logistic Regression (FQA)
For this section we will be using the loans data set, which contains information about loan applications. The outcome variable is default, which indicates whether each loan defaulted (default = 1) or repayed (default = 0).
Logistic regression allows us to model a binary (0 / 1) variable \(Y\) as a function of one or more \(X\) variables. The assumed underlying relationship is:
\[P(Y = 1) = \frac{1}{1 + e^{-(\beta_{0} + \beta_{1}X_{1} +\beta_{2}X_{2} + ... + \beta_{k}X_{k})}} \]
We use the glm() function to fit a logistic regression in R. The syntax is similar to that of the lm() function, but we need to add the additional argument (family = binomial).
modelLog <- glm(default ~ purpose + int.rate + installment + log.annual.inc + dti + fico,
data = loans,
family = binomial)As with linear regression, the summary() function provides detailed information about our model.
##
## Call:
## glm(formula = default ~ purpose + int.rate + installment + log.annual.inc +
## dti + fico, family = binomial, data = loans)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -1.2136 -0.6389 -0.5216 -0.3748 2.6753
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 8.6986873 1.1960945 7.273 3.53e-13 ***
## purposecredit_card -0.5300076 0.1075344 -4.929 8.28e-07 ***
## purposedebt_consolidation -0.3679648 0.0757552 -4.857 1.19e-06 ***
## purposeeducational 0.1105924 0.1496645 0.739 0.4599
## purposehome_improvement 0.1257786 0.1243317 1.012 0.3117
## purposemajor_purchase -0.3810833 0.1642754 -2.320 0.0204 *
## purposesmall_business 0.5572451 0.1151029 4.841 1.29e-06 ***
## int.rate 3.4872802 1.7282796 2.018 0.0436 *
## installment 0.0011184 0.0001726 6.480 9.19e-11 ***
## log.annual.inc -0.2855770 0.0538629 -5.302 1.15e-07 ***
## dti 0.0059689 0.0043349 1.377 0.1685
## fico -0.0112846 0.0012846 -8.784 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 8424.0 on 9577 degrees of freedom
## Residual deviance: 8003.4 on 9566 degrees of freedom
## AIC: 8027.4
##
## Number of Fisher Scoring iterations: 5
We can also use predict() to apply our model to new observations, but for logistic regression we need to add the argument type = "response".
newData <- data.frame(purpose = "home_improvement", int.rate = 0.10, installment = 400,
log.annual.inc = 11, dti = 14.5, fico = 730)
predict(modelLog, newData, type = "response")## 1
## 0.1581578