6.2 Interpreting Logistic Regression

We’ve seen how to build a regression with a binary variable as the response, by transforming that variable to the log odds using the logit function, and then fitting a linear relationship between the predictor variable and those log odds. R will happily go ahead and find the best-fit coefficients for that relationship and show them to you. But how do we interpret that output?

6.2.1 Grad school dataset

For this example, we’ll look at a dataset predicting graduate school admissions using various student characteristics. Each case is a single application, so the response is binary: the student is accepted or they aren’t.

There are three predictors available: the student’s score on the GRE (a standardized test), their grade point average (GPA), and their rank, which describes whether a student was in the top, second, third, or bottom quarter of their graduating class in college.

admissions_dat = read.csv("_data/admissions.csv")
admissions_dat %>% head()

##   admit gre  gpa rank
## 1     0 380 3.61    3
## 2     1 660 3.67    3
## 3     1 800 4.00    1
## 4     1 640 3.19    4
## 5     0 520 2.93    4
## 6     1 760 3.00    2

6.2.2 Coefficient interpretation

Let’s fit a logistic regression model using GRE score as the predictor. The equation we’re creating here is:

$\log\left(\frac{\widehat{p}}{1-\widehat{p}}\right) = b_0 + b_1GRE$ …where $\widehat{p}$ is the estimated probability of that student being admitted.

The R output looks like this:

admissions_glm1 = glm(admit ~ gre, 
                      data = admissions_dat,
                      family = binomial)
admissions_glm1 %>% summary()

## 
## Call:
## glm(formula = admit ~ gre, family = binomial, data = admissions_dat)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.1623  -0.9052  -0.7547   1.3486   1.9879  
## 
## Coefficients:
##              Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -2.901344   0.606038  -4.787 1.69e-06 ***
## gre          0.003582   0.000986   3.633  0.00028 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 499.98  on 399  degrees of freedom
## Residual deviance: 486.06  on 398  degrees of freedom
## AIC: 490.06
## 
## Number of Fisher Scoring iterations: 4

There’s a lot going on here, but for now, let’s focus on interpreting the coefficients. This is a bit weird, because, remember, this model describes the relationship between the predictor and the log odds of the response (the probability of admission). So there’s that link function, the logit transformation, standing between $b_0 + b_1GRE$ and $\widehat{p}$ .

This means we can interpret the coefficient in the R output “as usual” …if we’re thinking in terms of the log odds. A one-unit higher GRE is associated with log odds of admission 0.0036 units higher.

Don’t get too hung up on $c$ here. This is just an arbitrary way to write down a baseline student’s log odds of admission – whatever they are – so we have something to compare the other student to. What we want to know is, what’s the difference between the admission chances for Some Student and the admission chances for Some Other Student With One More Point On The GRE Than That First Person?

Let’s expand that a bit. Suppose we have a “baseline” student whose log odds of admission are equal to some value $c$ . Then if we have a new student who has a one-unit higher GRE, what can we say about their admission chances?

Well, the new student’s log odds of admission are 0.0036 units higher – that’s the kind of slope interpretation we’re used to. So their log odds of admission are $c+ 0.0036$ .

Next, we can walk that back to the odds by doing exponentiation. The new student’s odds of admission are:

$e^{(c + 0.0036)} = e^c * e^{0.0036} = e^c*1.00361$ Note the multiplication here! That’s how exponents work.

Granted, 1.00361 is not a lot larger than 1 – it doesn’t seem to be a very big effect. But then you remember that the GRE is scored on basically the same scheme as the SAT; the max score is 800. A 1-unit increase in GRE score is very small, so it makes sense that it’s associated with a very small change in the response! (In fact the GRE scores aren’t even recorded to the ones place, just the nearest 10.) So the association between GRE and admission can still be strong, even though the estimated “slope” seems small.

This means that the new student’s odds of admission are 1.00361 times the baseline student’s odds of admission. That is, the new student’s odds of admission are higher – which lines up with the fact that the coefficient estimate in that R regression output is positive! But the difference in the odds isn’t additive: their odds are getting multiplied, by a number larger than 1.

Now, a lot of people find the odds pretty intuitive to work with. In fact, this is about as good as it gets in terms of directly interpreting the coefficient of a predictor. If you want to talk about the estimated probability of admission, you have to walk back another step, going from the odds back to $\widehat{p}$ , and there’s no really nice, short way to describe what happens to that if GRE is 1 unit higher. But we will go all the way back to $\widehat{p}$ when we’re talking about making a prediction for an individual student.

6.2.3 Adding more predictors

While we’re at it, why not take a moment to put some more stuff in the model? Just as in linear regression, you can totally have multiple predictors in the same model – including categorical predictors. You just fit a multiple linear regression that relates all those predictors to the log odds of admission, and then use the logistic transformation to translate back from those log odds to a predicted probability.

For example, here’s a version of the model with all the available predictors included:

admissions_glm2 = glm(admit ~ gre + gpa + as.factor(rank),
                      data = admissions_dat,
                      family = binomial)
admissions_glm2 %>% summary()

## 
## Call:
## glm(formula = admit ~ gre + gpa + as.factor(rank), family = binomial, 
##     data = admissions_dat)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -1.6268  -0.8662  -0.6388   1.1490   2.0790  
## 
## Coefficients:
##                   Estimate Std. Error z value Pr(>|z|)    
## (Intercept)      -3.989979   1.139951  -3.500 0.000465 ***
## gre               0.002264   0.001094   2.070 0.038465 *  
## gpa               0.804038   0.331819   2.423 0.015388 *  
## as.factor(rank)2 -0.675443   0.316490  -2.134 0.032829 *  
## as.factor(rank)3 -1.340204   0.345306  -3.881 0.000104 ***
## as.factor(rank)4 -1.551464   0.417832  -3.713 0.000205 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 499.98  on 399  degrees of freedom
## Residual deviance: 458.52  on 394  degrees of freedom
## AIC: 470.52
## 
## Number of Fisher Scoring iterations: 4

Just like in multiple linear regression, we now have an extended table of coefficients, containing the estimated coefficients for each predictor.

Now that we’ve added some other terms to the model, the estimated coefficient for GRE has changed: it used to be 0.0036, and now it’s 0.0023. This is a reminder that when we interpret coefficients in a multiple regression model – whether it’s linear or logistic – we do so after accounting for other predictors.

So in this case, we could say that according to our model, if we see a student with 1 unit higher GRE score and the same GPA and class rank quartile, we predict that their odds of admission are multiplied by $e^{0.0023} = 1.0023$ .

The interpretation of categorical predictors is also similar to before – these coefficients are associated with the indicator variables for being in the second, third, or fourth quartile of the class. The first quartile is the baseline group! If a student is in one of those quartiles, we adjust their log odds by adding that estimated coefficient value…which is to say, we multiply their odds by “ $e$ to that value.”

So if I have two students with the same GRE and GPA, but one is in the first quartile of their class and the other is in, say, the second quartile of their class, then the lower-ranked student’s odds of admission are multiplied by $e^{-0.67} = 0.51$ . That is, a student in the first quartile of their class has about double the odds of admission as compared to a student in the second quartile with the same GRE and GPA – or in other words, after accounting for GRE and GPA. Notice again that the sign of the estimated coefficient makes sense: the estimated coefficient is negative, and being in one of those groups has a negative impact on the student’s chances of admission, as opposed to students in the top quarter of the class.

Response moment: Just to take a step back for a second: In the code for this model, I used as.factor() to tell R that the student’s class rank quartile should be treated as a categorical predictor, not quantitative. Why might I want to do that? (Think about what would happen if I didn’t!)