5.3 Ordinal Logistic Regression

These notes rely on UVA, PSU STAT 504 class notes, and Laerd Statistics.

The ordinal logistic regression model is

\[logit[P(Y \le j)] = \log \left[ \frac{P(Y \le j)}{P(Y \gt j)} \right] = \alpha_j - \beta X, \hspace{5mm} j \in [1, J-1]\] where \(j \in [1, J-1]\) are the levels of the ordinal outcome variable \(Y\). The proportional odds model assumes there is a common set of slope parameters \(\beta\) for the predictors. The ordinal outcomes are distinguished by the \(J-1\) intercepts \(\alpha_j\). The benchmark level is \(J\).

Technically, the model could be written \(logit[P(Y \le j)] = \alpha_j + \zeta X\), replacing beta with zeta because the model fits \(\alpha_j - \beta X\) instead of \(\alpha_j + \beta X\).

Suppose you want to model the probability a respondent holds a political ideology [“Socialist”, “Liberal”, “Moderate”, “Conservative”, “Libertarian”] given their party affiliation [“Republican”, “Democrat”].

table(ideology)

##      ideo
## party Socialist Liberal Moderate Conservative Libertarian
##   Rep        30      46      148           84          99
##   Dem        80      81      171           41          55

5.3.1 Assumptions

Ordinal regression makes four assumptions about the underlying data. One is that the response variable is ordinal (duh). The second is that the explanatory variables are continuous or categorical. You can include ordinal variables, but you need to treat them either as continous or categorical. Third, there is no multicollinearity. Fourth, the odds are proportional, meaning each independent variable has an identical effect at each cumulative split of the ordinal dependent variable. Test for proportionality with a full likelihood ratio test comparing the fitted location model to a model with varying location parameters. This test can sometimes flag violations that do not exist, so can also run separate binomial logistic regressions on cumulative dichotomous dependent variables to further determine if this assumption is met.

5.3.2 Modeling

Fit a proportional odds logistic regression.

pom <- MASS::polr(ideo ~ party, data = ideology)
summary(pom)

## Call:
## MASS::polr(formula = ideo ~ party, data = ideology)
## 
## Coefficients:
##           Value Std. Error t value
## partyDem -0.975      0.129   -7.54
## 
## Intercepts:
##                          Value   Std. Error t value
## Socialist|Liberal         -2.469   0.132    -18.736
## Liberal|Moderate          -1.475   0.109    -13.531
## Moderate|Conservative      0.237   0.094      2.516
## Conservative|Libertarian   1.070   0.104     10.292
## 
## Residual Deviance: 2474.98 
## AIC: 2484.98

The log-odds a Democrat identifies as Socialist vs >Socialist, or equivalently, the log-odds a Democrat identifies as <=Socialist vs >=Liberal is

\[logit[P(Y \le 1)] = -2.4690 - (-0.9745)(1) = -1.4945\]

which translates into an odds of

\[odds(Y<=1) = exp(logit[P(Y \le 1)]) = \frac{exp(-2.469)}{exp(-0.9745)} = 0.2244\]

It is the same for Republicans, except multiply the slope coefficient by zero.

\[logit[P(Y \le 1)] = -2.4690 - (-0.9745)(0) = -2.4690\]

\[odds(Y<=1) = exp(logit[P(Y \le 1)]) = \frac{exp(-2.469)}{exp(0)} = -2.4690\]

The “proportional odds” part of the proportional odds model is that the ratios of the \(J - 1\) odds are identical for each level of the predictors. The numerators are always the same, and the denominators differ only by the exponent of the slope coefficient, \(-0.9745\). For all \(j \in [1, J-1]\), the odds a Democrat’s ideology is \(\le j\) vs \(>j\) is \(exp(-0.9745) = 2.6498\) times that of a Republican’s odds.

You can translate the cumulative odds to cumulative probabilities by taking the ratio \(\pi = exp(odds) / (1 + exp(odds))\). The probability a Democrat identifies as <=Socialist vs >Socialist is

\[P(Y \le 1) = \frac{exp(-1.4945)} {(1 + exp(-1.4945))} = 0.183.\]

The individual probabilities are just the successive differences in the cumulative probabilities. The log odds a Democrat identifies as <=Liberal vs >Liberal are \(logit[P(Y \lt 2)] = -1.4745 - (-0.9745)(1) = -0.500\), which translates into a probability of \(P(Y \le 2) = exp(-0.5) / (1 + exp(-0.5)) = 0.378\). The probability a Democrat identifies as Liberal is the difference in adjacent cumulative probabilities, \(P(Y \le 2) - P(Y \le 1) = 0.378 = 0.183 = 0.194\). This is how the model to predicts the level probabilities.

x <- predict(pom, newdata = data.frame(party = c("Dem", "Rep")), type = "probs")
rownames(x) <- c("Dem", "Rep")
print(x)

##     Socialist Liberal Moderate Conservative Libertarian
## Dem     0.183    0.19     0.39         0.11        0.11
## Rep     0.078    0.11     0.37         0.19        0.26

Always check the assumption of proportional odds. One way to do this is by comparing the proportional odds model with a multinomial logit model, also called an unconstrained baseline logit model. The multinomial logit model models unordered responses and fits a slope to each level of the \(J – 1\) responses. The proportional odds model is nested in the multinomial model, so you can use a likelihood ratio test to see if the models are statistically different.

mlm <- nnet::multinom(ideo ~ party, data = ideology)

## # weights:  15 (8 variable)
## initial  value 1343.880657 
## iter  10 value 1239.866743
## final  value 1235.648615 
## converged

Calculate the difference in the deviance test statistics \(D = -2 loglik(\beta)\).

G <- -2 * (logLik(pom)[1] - logLik(mlm)[1])
pchisq(G, df = length(pom$zeta) - 1, lower.tail = FALSE)

## [1] 0.3

The p-value is high, so do not reject the null hypothesis that the proportional odds model fits differently than the more complex multinomial logit model.

5.3.3 Case Study

The General Social Survey for year 1972, 1973, and 1974 surveyed caucasian Christians about their attitudes att toward abortion. Respondents were classified by years of education edu and religious group att.

abort %>% pivot_wider(names_from = att, values_from = cnt)

## # A tibble: 27 x 6
##    year  rel   edu     Neg   Mix   Pos
##    <fct> <fct> <fct> <dbl> <dbl> <dbl>
##  1 1972  Prot  Low       9    12    48
##  2 1972  Prot  Med      13    43   197
##  3 1972  Prot  High      4     9   139
##  4 1972  SProt Low       9    17    30
##  5 1972  SProt Med       6    10    97
##  6 1972  SProt High      1     8    68
##  7 1972  Cath  Low      14    12    32
##  8 1972  Cath  Med      18    50   131
##  9 1972  Cath  High      8    13    64
## 10 1973  Prot  Low       4    16    59
## # ... with 17 more rows

Fit a proportional-odds cumulative logit model with just main effects. There are two main effects for year, two for rel, and two for edu, plus two logit equations for the response for a total of eight parameters.

abort_mdl <- MASS::polr(att ~ year + rel + edu, data = abort, weights = cnt)
summary(abort_mdl)

## Call:
## MASS::polr(formula = att ~ year + rel + edu, data = abort, weights = cnt)
## 
## Coefficients:
##           Value Std. Error t value
## year1973  0.221      0.105    2.11
## year1974  0.233      0.106    2.20
## relSProt -0.249      0.113   -2.19
## relCath  -0.796      0.100   -7.93
## eduMed    0.717      0.109    6.59
## eduHigh   1.127      0.128    8.79
## 
## Intercepts:
##         Value  Std. Error t value
## Neg|Mix  -2.33   0.14     -17.27 
## Mix|Pos  -0.79   0.12      -6.49 
## 
## Residual Deviance: 4059.32 
## AIC: 4075.32

All predictors are significant. Now fit the saturated model.

abort_mdl_sat <- MASS::polr(att ~ year*rel*edu, weights = cnt, data = abort)
summary(abort_mdl_sat)

## Call:
## MASS::polr(formula = att ~ year * rel * edu, data = abort, weights = cnt)
## 
## Coefficients:
##                             Value Std. Error t value
## year1973                   0.3422      0.365  0.9370
## year1974                  -0.0144      0.361 -0.0398
## relSProt                  -0.6274      0.366 -1.7141
## relCath                   -0.7238      0.369 -1.9620
## eduMed                     0.5036      0.301  1.6743
## eduHigh                    1.6007      0.390  4.1068
## year1973:relSProt          0.1093      0.524  0.2088
## year1974:relSProt          0.4387      0.521  0.8421
## year1973:relCath           0.3895      0.539  0.7228
## year1974:relCath           0.8696      0.559  1.5543
## year1973:eduMed            0.2805      0.441  0.6365
## year1974:eduMed            0.5487      0.430  1.2761
## year1973:eduHigh          -0.5961      0.540 -1.1031
## year1974:eduHigh          -0.1175      0.541 -0.2171
## relSProt:eduMed            1.1421      0.479  2.3849
## relCath:eduMed             0.1275      0.424  0.3004
## relSProt:eduHigh           0.3039      0.586  0.5186
## relCath:eduHigh           -0.5623      0.532 -1.0572
## year1973:relSProt:eduMed  -1.2710      0.667 -1.9069
## year1974:relSProt:eduMed  -1.3331      0.668 -1.9952
## year1973:relCath:eduMed   -0.8360      0.629 -1.3291
## year1974:relCath:eduMed   -0.8973      0.647 -1.3879
## year1973:relSProt:eduHigh  0.9047      0.878  1.0309
## year1974:relSProt:eduHigh -0.4321      0.805 -0.5369
## year1973:relCath:eduHigh  -0.2265      0.749 -0.3021
## year1974:relCath:eduHigh  -0.9896      0.766 -1.2915
## 
## Intercepts:
##         Value  Std. Error t value
## Neg|Mix -2.311  0.267     -8.647 
## Mix|Pos -0.761  0.261     -2.922 
## 
## Residual Deviance: 4018.27 
## AIC: 4074.27

Compare the two models.

anova(abort_mdl, abort_mdl_sat)

## Likelihood ratio tests of ordinal regression models
## 
## Response: att
##              Model Resid. df Resid. Dev   Test    Df LR stat. Pr(Chi)
## 1 year + rel + edu      3229       4059                              
## 2 year * rel * edu      3209       4018 1 vs 2    20       41  0.0037

The likelihood ratio test indicates the main-effects model fits poorly in comparison to the saturated model (LR = 41.0, df = 20, p < 0.01). From the table of coefficients,the effects of religion and education appear to be much more powerful than the year, so consider modeling an interaction between rel and edu. This is also what the stepwise AIC algorithm recommends.

summary(abort_mdl_step)

## Call:
## MASS::polr(formula = att ~ year + rel + edu + rel:edu, data = abort, 
##     weights = cnt)
## 
## Coefficients:
##                   Value Std. Error t value
## year1973          0.228      0.105   2.171
## year1974          0.241      0.106   2.271
## relSProt         -0.450      0.213  -2.110
## relCath          -0.348      0.223  -1.556
## eduMed            0.750      0.177   4.230
## eduHigh           1.369      0.219   6.244
## relSProt:eduMed   0.253      0.267   0.946
## relCath:eduMed   -0.389      0.260  -1.496
## relSProt:eduHigh  0.385      0.336   1.146
## relCath:eduHigh  -0.944      0.307  -3.079
## 
## Intercepts:
##         Value  Std. Error t value
## Neg|Mix  -2.26   0.17     -13.28 
## Mix|Pos  -0.72   0.16      -4.49 
## 
## Residual Deviance: 4040.44 
## AIC: 4064.44

Compare the model with the rel:edu interaction to the saturated model.

anova(abort_mdl_step, abort_mdl_sat)

## Likelihood ratio tests of ordinal regression models
## 
## Response: att
##                        Model Resid. df Resid. Dev   Test    Df LR stat. Pr(Chi)
## 1 year + rel + edu + rel:edu      3225       4040                              
## 2           year * rel * edu      3209       4018 1 vs 2    16       22    0.14

Great, this time they are not significantly different (LR = 22.2, df = 16, p = 0.138). Interpret the results.

Positive coefficients mean attitudes toward legalizing abortion are more positive relative to the reference year, 1972. The odds of supporting legalization in 1973 compared to 1972 were \(exp(0.2281) = 1.26\). The odds for 1974 vs 1972 were \(exp(0.2410) = 1.27\), so attitudes toward abortion became more positive from 1972 to 1973, but remained nearly unchanged from 1973 to 1974.

Among Protestants (reference religious group), increasing education is associated with more positive attitudes toward abortion legalization. The odds of a Protestant with medium education vs low education supporting legalization are \(exp(0.7504) = 2.12\). Among Southern Protestants, odds are \(exp(0.7504 + 0.2526) = 2.73\). Therefore, the estimated effects of education for Southern Protestants are in the same direction as for Protestants but are somewhat larger. Note, however, that the interaction effect coefficient is not not significantly different from zero, so the effect of education among Protestants and Southern Protestants is not significantly different.

Among Catholics, the medium vs low education odds are \(exp(0.7504- 0.3892) = 1.44\). And the high vs low education odds are \(exp(1.3689 - 0.9442) = 1.53\). Increasing education is still associated with more positive attitudes, but the effects are smaller than they are among Protestants and Southern Protestants.

Example Summarization

We used logistic regression to investigate whether groups with the Christian religion might moderate the effects of education on attitudes toward abortion legalization. For Protestants, higher education education was associated with higher, significant, increase of odds of a more positive attitude toward abortion legalization, b = 0.7504, SE = 0.1774, OR = 2.12, p < .01. There was a significant interaction for Catholics at high levels of education, b = -0.9442, SE = 0.3066, p < .01, relative to the Protestant reference group, but no significant interaction at medium education, and no interaction at all for Southern Protestants relative to the reference group. The figure above graphs the interaction, showing the change in the expected probability of positive attitude by education level for Protestant, Southern Protestant, and Catholic religious groups. Overall, the significant interaction for Catholics at high levels of education suggests that education has a different relationship to attitudes toward abortion depending on the individual’s religious group, but the difference between Protestant and Southern Protestant is minimal.