Chapter 6 Logistic Regression
We'll load the Default dataset used in the notes.
library(ISLR)
data(Default)
#convert default from yes/no to 0/1
Default$default <- as.numeric(Default$default=="Yes")
6.1 Section 6.1: Visualizing the Logistic Curve
Template:
ggplot(data=Dataset_Name, aes(y=Response_Variable, x= Explanatory_Variable)) +
geom_point(alpha=0.2) +
stat_smooth(method="glm", se=FALSE, method.args = list(family=binomial))
6.1.1 Visualizing Logistic Regression
ggplot(data=Default, aes(y=default, x= balance)) + geom_point(alpha=0.2) +
stat_smooth(method="glm", se=FALSE, method.args = list(family=binomial))
6.2 Fitting Logistic Regression Model
6.2.1 Logistic Regression Template
Template:
M <- glm(data=Dataset_Name, Response_Variable ~ Explanatory_Variable,
family = binomial(link = "logit"))
summary(M)
6.2.2 Logistic Regression Example
CCDefault_M <- glm(data=Default, default ~ balance, family = binomial(link = "logit"))
summary(M)
##
## Call:
## lm(formula = Weight ~ Age * Sex, data = Bears_Subset)
##
## Residuals:
## Min 1Q Median 3Q Max
## -207.583 -38.854 -9.574 23.905 174.802
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 70.4322 17.7260 3.973 0.000219 ***
## Age 3.2381 0.3435 9.428 7.65e-13 ***
## Sex2 -31.9574 35.0314 -0.912 0.365848
## Age:Sex2 -1.0350 0.6237 -1.659 0.103037
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 70.18 on 52 degrees of freedom
## (41 observations deleted due to missingness)
## Multiple R-squared: 0.6846, Adjusted R-squared: 0.6664
## F-statistic: 37.62 on 3 and 52 DF, p-value: 4.552e-13
6.2.3 Intervals and Predictions in Logistic Regression
The confint()
command returns the model coefficient.
confint(CCDefault_M, level = 0.95)
## 2.5 % 97.5 %
## (Intercept) -11.383288936 -9.966565064
## balance 0.005078926 0.005943365
Often, we are interested in \(e^{b_j}\). We can calculate this using exp()
exp(confint(CCDefault_M, level = 0.95))
## 2.5 % 97.5 %
## (Intercept) 1.138415e-05 4.694353e-05
## balance 1.005092e+00 1.005961e+00
To obtain predictions as probabilities, use type="response"
.
predict(CCDefault_M, newdata=data.frame((balance=1000)), type="response")
## 1
## 0.005752145