Chapter 6 Logistic Regression (FQA)

For this section we will be using the loans data set, which contains information about loan applications. The outcome variable is default, which indicates whether each loan defaulted (default = 1) or repayed (default = 0).

Logistic regression allows us to model a binary (0 / 1) variable \(Y\) as a function of one or more \(X\) variables. The assumed underlying relationship is:

\[P(Y = 1) = \frac{1}{1 + e^{-(\beta_{0} + \beta_{1}X_{1} +\beta_{2}X_{2} + ... + \beta_{k}X_{k})}} \]

We use the glm() function to fit a logistic regression in R. The syntax is similar to that of the lm() function, but we need to add the additional argument (family = binomial).

As with linear regression, the summary() function provides detailed information about our model.

## 
## Call:
## glm(formula = default ~ purpose + int.rate + installment + log.annual.inc + 
##     dti + fico, family = binomial, data = loans)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.2136  -0.6389  -0.5216  -0.3748   2.6753  
## 
## Coefficients:
##                             Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                8.6986873  1.1960945   7.273 3.53e-13 ***
## purposecredit_card        -0.5300076  0.1075344  -4.929 8.28e-07 ***
## purposedebt_consolidation -0.3679648  0.0757552  -4.857 1.19e-06 ***
## purposeeducational         0.1105924  0.1496645   0.739   0.4599    
## purposehome_improvement    0.1257786  0.1243317   1.012   0.3117    
## purposemajor_purchase     -0.3810833  0.1642754  -2.320   0.0204 *  
## purposesmall_business      0.5572451  0.1151029   4.841 1.29e-06 ***
## int.rate                   3.4872802  1.7282796   2.018   0.0436 *  
## installment                0.0011184  0.0001726   6.480 9.19e-11 ***
## log.annual.inc            -0.2855770  0.0538629  -5.302 1.15e-07 ***
## dti                        0.0059689  0.0043349   1.377   0.1685    
## fico                      -0.0112846  0.0012846  -8.784  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 8424.0  on 9577  degrees of freedom
## Residual deviance: 8003.4  on 9566  degrees of freedom
## AIC: 8027.4
## 
## Number of Fisher Scoring iterations: 5

We can also use predict() to apply our model to new observations, but for logistic regression we need to add the argument type = "response".

##         1 
## 0.1581578