# Chapter 11 Multinomial Logistic Regression

## 11.1 Introduction to Multinomial Logistic Regression

Logistic regression is a technique used when the dependent variable is categorical (or nominal). For Binary logistic regression the number of dependent variables is two, whereas the number of dependent variables for multinomial logistic regression is more than two.

Examples: Consumers make a decision to buy or not to buy, a product may pass or fail quality control, there are good or poor credit risks, and employee may be promoted or not.

## 11.2 Equation

In logistic regression, a logistic transformation of the odds (referred to as logit) serves as the depending variable:

\[\log (o d d s)=\operatorname{logit}(P)=\ln \left(\frac{P}{1-P}\right)=a+b_{1} x_{1}+b_{2} x_{2}+b_{3} x_{3}+\ldots\]

or

\[p=\frac{\exp \left(a+b_{1} X_{1}+b_{2} X_{2}+b_{3} X_{3}+\ldots\right)}{1+\exp \left(a+b_{1} X_{1}+b_{2} X_{2}+b_{3} X_{3}+\ldots\right)}\] > Where:

p = the probability that a case is in a particular category,

exp = the exponential (approx. 2.72),

a = the constant of the equation and,

b = the coefficient of the predictor or independent variables.

Logits or Log Odds

Odds value can range from 0 to infinity and tell you how much more likely it is that an observation is a member of the target group rather than a member of the other group.

- Odds = p/(1-p)

If the probability is 0.80, the odds are 4 to 1 or .80/.20; if the probability is 0.25, the odds are .33 (.25/.75).

The odds ratio (OR), estimates the change in the odds of membership in the target group for a one unit increase in the predictor. It is calculated by using the regression coefficient of the predictor as the exponent or exp.

Assume in the example earlier where we were predicting accountancy success by a maths competency predictor that b = 2.69. Thus the odds ratio is exp(2.69) or 14.73. Therefore the odds of passing are 14.73 times greater for a student for example who had a pre-test score of 5 than for a student whose pre-test score was 4.

## 11.3 Hypothesis Test of Coefficients

In logistic regression, hypotheses are of interest:

The null hypothesis, which is when all the coefficients in the regression equation take the value zero, and

The alternate hypothesis that the model currently under consideration is accurate and differs significantly from the null of zero, i.e. gives significantly better than the chance or random prediction level of the null hypothesis.

Evaluation of Hypothesis

We then work out the likelihood of observing the data we actually did observe under each of these hypotheses. The result is usually a very small number, and to make it easier to handle, the *natural logarithm* is used, producing a *log likelihood (LL)*. Probabilities are always less than one, so LL’s are always negative. Log likelihood is the basis for tests of a logistic model.

## 11.4 Likelihood Ratio Test

The likelihood ratio test is based on -2LL ratio. It is a test of the significance of the difference between the likelihood ratio (-2LL) for the researcher’s model with predictors (called model chi square) minus the likelihood ratio for baseline model with only a constant in it.

Significance at the .05 level or lower means the researcher’s model with the predictors is significantly different from the one with the constant only (all ‘b’ coefficients being zero). It measures the improvement in fit that the explanatory variables make compared to the null model.

Chi square is used to assess significance of this ratio (see Model Fitting Information in SPSS output).

\(H_0\): There is no difference between null model and final model.

\(H_1\): There is difference between null model and final model.

## 11.5 Checking AssumptionL: Multicollinearity

Just run “linear regression” after assuming categorical dependent variable as continuous variable

If the largest VIF (Variance Inflation Factor) is greater than 10 then there is cause of concern (Bowerman & O’Connell, 1990)

Tolerance below 0.1 indicates a serious problem.

Tolerance below 0.2 indicates a potential problem (Menard,1995).

If the Condition index is greater than 15 then the multicollinearity is assumed.

## 11.6 Features of Multinomial logistic regression

Multinomial logistic regression to predict membership of more than two categories. It (basically) works in the same way as binary logistic regression. The analysis breaks the outcome variable down into a series of comparisons between two categories.

E.g., if you have three outcome categories (A, B and C), then the analysis will consist of two comparisons that you choose:

Compare everything against your first category (e.g. A vs. B and A vs. C),

Or your last category (e.g. A vs. C and B vs. C),

Or a custom category (e.g. B vs. A and B vs. C).

The important parts of the analysis and output are much the same as we have just seen for binary logistic regression.

## 11.7 R Labs: Running Multinomial Logistic Regression in R

### 11.7.1 Understanding the Data: Choice of Programs

The data set(hsbdemo.sav) contains variables on 200 students. The outcome variable is prog, program type (1=general, 2=academic, and 3=vocational). The predictor variables are ses, social economic status (1=low, 2=middle, and 3=high), math, mathematics score, and science, science score: both are continuous variables.

(Research Question):When high school students choose the program (general, vocational, and academic programs), how do their math and science scores and their social economic status (SES) affect their decision?

### 11.7.2 Prepare and review the data

```
# Starting our example by import the data into R
library(haven)
hsbdemo <- read_sav("hsbdemo.sav")
hsb <- hsbdemo # Get a new copy of data
summary(hsb)
```

```
## id female ses schtyp
## Min. : 1.00 Min. :0.000 Min. :1.000 Min. :1.00
## 1st Qu.: 50.75 1st Qu.:0.000 1st Qu.:2.000 1st Qu.:1.00
## Median :100.50 Median :1.000 Median :2.000 Median :1.00
## Mean :100.50 Mean :0.545 Mean :2.055 Mean :1.16
## 3rd Qu.:150.25 3rd Qu.:1.000 3rd Qu.:3.000 3rd Qu.:1.00
## Max. :200.00 Max. :1.000 Max. :3.000 Max. :2.00
## prog read write math
## Min. :1.000 Min. :28.00 Min. :31.00 Min. :33.00
## 1st Qu.:2.000 1st Qu.:44.00 1st Qu.:45.75 1st Qu.:45.00
## Median :2.000 Median :50.00 Median :54.00 Median :52.00
## Mean :2.025 Mean :52.23 Mean :52.77 Mean :52.65
## 3rd Qu.:2.250 3rd Qu.:60.00 3rd Qu.:60.00 3rd Qu.:59.00
## Max. :3.000 Max. :76.00 Max. :67.00 Max. :75.00
## science socst honors awards cid
## Min. :26.00 Min. :26.00 Min. :0.000 Min. :0.00 Min. : 1.00
## 1st Qu.:44.00 1st Qu.:46.00 1st Qu.:0.000 1st Qu.:0.00 1st Qu.: 5.00
## Median :53.00 Median :52.00 Median :0.000 Median :1.00 Median :10.50
## Mean :51.85 Mean :52.41 Mean :0.265 Mean :1.67 Mean :10.43
## 3rd Qu.:58.00 3rd Qu.:61.00 3rd Qu.:1.000 3rd Qu.:2.00 3rd Qu.:15.00
## Max. :74.00 Max. :71.00 Max. :1.000 Max. :7.00 Max. :20.00
```

Now let’s do the descriptive analysis

```
# Use the descritptives function to get the descritptive data
descriptives(hsb, vars = vars(ses, prog, math, science), freq = TRUE)
```

```
##
## DESCRIPTIVES
##
## Descriptives
## ───────────────────────────────────────────────────────────
## ses prog math science
## ───────────────────────────────────────────────────────────
## N 200 200 200 200
## Missing 0 0 0 0
## Mean 2.055000 2.025000 52.64500 51.85000
## Median 2.000000 2.000000 52.00000 53.00000
## Minimum 1.000000 1.000000 33.00000 26.00000
## Maximum 3.000000 3.000000 75.00000 74.00000
## ───────────────────────────────────────────────────────────
##
##
## FREQUENCIES
##
## Frequencies of ses
## ──────────────────────────────────────────────────
## Levels Counts % of Total Cumulative %
## ──────────────────────────────────────────────────
## 1 47 23.50000 23.50000
## 2 95 47.50000 71.00000
## 3 58 29.00000 100.00000
## ──────────────────────────────────────────────────
##
##
## Frequencies of prog
## ──────────────────────────────────────────────────
## Levels Counts % of Total Cumulative %
## ──────────────────────────────────────────────────
## 1 45 22.50000 22.50000
## 2 105 52.50000 75.00000
## 3 50 25.00000 100.00000
## ──────────────────────────────────────────────────
```

```
# To see the crosstable, we need CrossTable function from gmodels package
library(gmodels)
# Build a crosstable between admit and rank
CrossTable(hsb$ses, hsb$prog)
```

```
##
##
## Cell Contents
## |-------------------------|
## | N |
## | Chi-square contribution |
## | N / Row Total |
## | N / Col Total |
## | N / Table Total |
## |-------------------------|
##
##
## Total Observations in Table: 200
##
##
## | hsb$prog
## hsb$ses | 1 | 2 | 3 | Row Total |
## -------------|-----------|-----------|-----------|-----------|
## 1 | 16 | 19 | 12 | 47 |
## | 2.783 | 1.305 | 0.005 | |
## | 0.340 | 0.404 | 0.255 | 0.235 |
## | 0.356 | 0.181 | 0.240 | |
## | 0.080 | 0.095 | 0.060 | |
## -------------|-----------|-----------|-----------|-----------|
## 2 | 20 | 44 | 31 | 95 |
## | 0.088 | 0.692 | 2.213 | |
## | 0.211 | 0.463 | 0.326 | 0.475 |
## | 0.444 | 0.419 | 0.620 | |
## | 0.100 | 0.220 | 0.155 | |
## -------------|-----------|-----------|-----------|-----------|
## 3 | 9 | 42 | 7 | 58 |
## | 1.257 | 4.381 | 3.879 | |
## | 0.155 | 0.724 | 0.121 | 0.290 |
## | 0.200 | 0.400 | 0.140 | |
## | 0.045 | 0.210 | 0.035 | |
## -------------|-----------|-----------|-----------|-----------|
## Column Total | 45 | 105 | 50 | 200 |
## | 0.225 | 0.525 | 0.250 | |
## -------------|-----------|-----------|-----------|-----------|
##
##
```

### 11.7.3 Run the Multinomial Model using “nnet” package

Below we use the multinom function from the nnet package to estimate a multinomial logistic regression model. There are other functions in other R packages capable of multinomial regression. We chose the multinom function because it does not require the data to be reshaped (as the mlogit package does) and to mirror the example code found in Hilbe’s Logistic Regression Models.

First, we need to choose the level of our outcome that we wish to use as our baseline and specify this in the relevel function. Then, we run our model using multinom. The multinom package does not include p-value calculation for the regression coefficients, so we calculate p-values using Wald tests (here z-tests).

```
# Load the multinom package
library(nnet)
# Since we are going to use Academic as the reference group, we need relevel the group.
hsb$prog2 <- relevel(as.factor(hsb$prog), ref = 2)
hsb$ses <- as.factor(hsb$ses)
levels(hsb$prog2)
```

`## [1] "2" "1" "3"`

```
# Give the names to each level
levels(hsb$prog2) <- c("academic","general","vocational")
# Run a "only intercept" model
OIM <- multinom(prog2 ~ 1, data = hsb)
```

```
## # weights: 6 (2 variable)
## initial value 219.722458
## final value 204.096674
## converged
```

```
## Call:
## multinom(formula = prog2 ~ 1, data = hsb)
##
## Coefficients:
## (Intercept)
## general -0.8472980
## vocational -0.7419374
##
## Std. Errors:
## (Intercept)
## general 0.1781742
## vocational 0.1718249
##
## Residual Deviance: 408.1933
## AIC: 412.1933
```

```
# Run a multinomial model
multi_mo <- multinom(prog2 ~ ses + math + science + math*science, data = hsb,model=TRUE)
```

```
## # weights: 21 (12 variable)
## initial value 219.722458
## iter 10 value 173.831002
## iter 20 value 167.382760
## final value 166.951813
## converged
```

```
## Call:
## multinom(formula = prog2 ~ ses + math + science + math * science,
## data = hsb, model = TRUE)
##
## Coefficients:
## (Intercept) ses2 ses3 math science
## general 5.897618 -0.4081497 -1.1254491 -0.1852220 0.01323626
## vocational 22.728283 0.8402168 -0.5605656 -0.5036705 -0.28297703
## math:science
## general 0.001025283
## vocational 0.006185571
##
## Std. Errors:
## (Intercept) ses2 ses3 math science math:science
## general 0.002304064 0.2613732 0.2134308 0.02694593 0.02953364 0.0004761369
## vocational 0.003856861 0.2959741 0.1984775 0.02681947 0.03142872 0.0004760567
##
## Residual Deviance: 333.9036
## AIC: 357.9036
```

```
# Check the Z-score for the model (wald Z)
z <- summary(multi_mo)$coefficients/summary(multi_mo)$standard.errors
z
```

```
## (Intercept) ses2 ses3 math science math:science
## general 2559.659 -1.561559 -5.273132 -6.873837 0.4481759 2.153336
## vocational 5892.948 2.838819 -2.824328 -18.780028 -9.0037711 12.993348
```

```
## (Intercept) ses2 ses3 math science
## general 0 0.118391943 1.341147e-07 6.249667e-12 0.6540263
## vocational 0 0.004528089 4.737981e-03 0.000000e+00 0.0000000
## math:science
## general 0.03129229
## vocational 0.00000000
```

These are the logit coefficients relative to the reference category. For example,under ‘math’, the -0.185 suggests that for one unit increase in ‘science’ score, the logit coefficient for ‘low’ relative to ‘middle’ will go down by that amount, -0.185.

### 11.7.4 Check the model fit information

```
# the anova function is confilcted with JMV's anova function, so we need to unlibrary the JMV function before we use the anova function.
# detach("package:jmv", unload=TRUE)
# Compare the our test model with the "Only intercept" model
# anova(OIM,multi_mo)
```

Interpretation of the Model Fit information

The log-likelihood is a measure of how much unexplained variability there is in the data. Therefore, the difference or change in log-likelihood indicates how much new variance has been explained by the model.

The chi-square test tests the decrease in unexplained variance from the baseline model (408.1933) to the final model (333.9036), which is a difference of 408.1933 - 333.9036 = 74.29. This change is significant, which means that our final model explains a significant amount of the original variability.

The likelihood ratio chi-square of 74.29 with a p-value < 0.001 tells us that our model as a whole fits significantly better than an empty or null model (i.e., a model with no predictors).

### 11.7.5 Calculate the Goodness of fit

```
## academic general vocational
## 1 0.18801940 0.17122451 0.6407561
## 2 0.12019189 0.10715542 0.7726527
## 3 0.52212681 0.08123771 0.3966355
## 4 0.23683979 0.23125435 0.5319059
## 5 0.10132130 0.12329032 0.7753884
## 6 0.38079544 0.10780793 0.5113966
## 7 0.32321815 0.16454057 0.5122413
## 8 0.09033932 0.08381233 0.8258484
## 9 0.02336687 0.09050704 0.8861261
## 10 0.32321815 0.16454057 0.5122413
## 11 0.16304678 0.29839918 0.5385540
## 12 0.22842326 0.27539161 0.4961851
## 13 0.32747927 0.28141483 0.3911059
## 14 0.05717483 0.12540921 0.8174160
## 15 0.15003741 0.52649953 0.3234631
## 16 0.12638004 0.20495962 0.6686603
## 17 0.25269654 0.39841475 0.3488887
## 18 0.05771613 0.08029142 0.8619924
## 19 0.27404420 0.16131436 0.5646414
## 20 0.25197679 0.20299587 0.5450273
## 21 0.15870561 0.17100945 0.6702849
## 22 0.27404420 0.16131436 0.5646414
## 23 0.16304678 0.29839918 0.5385540
## 24 0.16340250 0.31572103 0.5208765
## 25 0.26080538 0.36665885 0.3725358
## 26 0.74715288 0.12007209 0.1327750
## 27 0.33135572 0.26860688 0.4000374
## 28 0.32321815 0.16454057 0.5122413
## 29 0.40025162 0.32553566 0.2742127
## 30 0.19518342 0.30892507 0.4958915
```

```
## [1] vocational vocational academic vocational vocational vocational
## [7] vocational vocational vocational vocational vocational vocational
## [13] vocational vocational general vocational general vocational
## [19] vocational vocational vocational vocational vocational vocational
## [25] vocational academic vocational vocational academic vocational
## Levels: academic general vocational
```

```
##
## Pearson's Chi-squared test
##
## data: hsb$prog2 and predict(multi_mo)
## X-squared = 47.841, df = 4, p-value = 1.019e-09
```

### 11.7.6 Calculate the Pseudo R-Square

```
# Please takeout the "#" Sign to run the code
# Load the DescTools package for calculate the R square
# library("DescTools")
# Calculate the R Square
# PseudoR2(multi_mo, which = c("CoxSnell","Nagelkerke","McFadden"))
```

Interpretation of the R-Square:

These are three pseudo R squared values. Logistic regression does not have an equivalent to the R squared that is found in OLS regression; however, many people have tried to come up with one. These statistics do not mean exactly what R squared means in OLS regression (the proportion of variance of the response variable explained by the predictors), we suggest interpreting them with great caution.

Cox and Snell’s R-Square imitates multiple R-Square based on ‘likelihood’, but its maximum can be (and usually is) less than 1.0, making it difficult to interpret. Here it is indicating that there is the relationship of 31% between the dependent variable and the independent variables. Or it is indicating that 31% of the variation in the dependent variable is explained by the logistic model.

The Nagelkerke modification that does range from 0 to 1 is a more reliable measure of the relationship. Nagelkerke’s R2 will normally be higher than the Cox and Snell measure. In our case it is 0.357, indicating a relationship of 35.7% between the predictors and the prediction.

McFadden = {LL(null) – LL(full)} / LL(null). In our case it is 0.182, indicating a relationship of 18.2% between the predictors and the prediction.

### 11.7.7 Likelihood Ratio Tests

```
# Use the lmtest package to run Likelihood Ratio Tests
library(lmtest)
lrtest(multi_mo, "ses") # Chi-Square=12.922,p=0.01166*
```

```
## # weights: 15 (8 variable)
## initial value 219.722458
## iter 10 value 176.645309
## iter 20 value 173.670728
## iter 30 value 173.430200
## final value 173.412997
## converged
```

```
## Likelihood ratio test
##
## Model 1: prog2 ~ ses + math + science + math * science
## Model 2: prog2 ~ math + science + math:science
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 12 -166.95
## 2 8 -173.41 -4 12.922 0.01166 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
## # weights: 18 (10 variable)
## initial value 219.722458
## iter 10 value 172.387155
## final value 172.258318
## converged
```

```
## Likelihood ratio test
##
## Model 1: prog2 ~ ses + math + science + math * science
## Model 2: prog2 ~ ses + science + math:science
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 12 -166.95
## 2 10 -172.26 -2 10.613 0.004959 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
## # weights: 18 (10 variable)
## initial value 219.722458
## iter 10 value 169.972735
## final value 169.645522
## converged
```

```
## Likelihood ratio test
##
## Model 1: prog2 ~ ses + math + science + math * science
## Model 2: prog2 ~ ses + math + math:science
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 12 -166.95
## 2 10 -169.65 -2 5.3874 0.06763 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

```
## # weights: 18 (10 variable)
## initial value 219.722458
## iter 10 value 170.088834
## final value 169.576379
## converged
```

```
## Likelihood ratio test
##
## Model 1: prog2 ~ ses + math + science + math * science
## Model 2: prog2 ~ ses + math + science
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 12 -166.95
## 2 10 -169.58 -2 5.2491 0.07247 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
```

Interpretation of the Likelihood Ratio Tests

The results of the likelihood ratio tests can be used to ascertain the significance of predictors to the model. This table tells us that SES and math score had significant main effects on program selection, \(X^2\)(4) = 12.917, p = .012 for SES and \(X^2\)(2) = 10.613, p = .005 for SES.

These likelihood statistics can be seen as sorts of overall statistics that tell us which predictors significantly enable us to predict the outcome category, but they don’t really tell us specifically what the effect is. To see this we have to look at the individual parameter estimates.

### 11.7.8 Parameter Estimates

```
## Call:
## multinom(formula = prog2 ~ ses + math + science + math * science,
## data = hsb, model = TRUE)
##
## Coefficients:
## (Intercept) ses2 ses3 math science
## general 5.897618 -0.4081497 -1.1254491 -0.1852220 0.01323626
## vocational 22.728283 0.8402168 -0.5605656 -0.5036705 -0.28297703
## math:science
## general 0.001025283
## vocational 0.006185571
##
## Std. Errors:
## (Intercept) ses2 ses3 math science math:science
## general 0.002304064 0.2613732 0.2134308 0.02694593 0.02953364 0.0004761369
## vocational 0.003856861 0.2959741 0.1984775 0.02681947 0.03142872 0.0004760567
##
## Residual Deviance: 333.9036
## AIC: 357.9036
```

```
## (Intercept) ses2 ses3 math science math:science
## general 2559.659 -1.561559 -5.273132 -6.873837 0.4481759 2.153336
## vocational 5892.948 2.838819 -2.824328 -18.780028 -9.0037711 12.993348
```

```
## (Intercept) ses2 ses3 math science
## general 0 0.118391943 1.341147e-07 6.249667e-12 0.6540263
## vocational 0 0.004528089 4.737981e-03 0.000000e+00 0.0000000
## math:science
## general 0.03129229
## vocational 0.00000000
```

Note that the table is split into two rows. This is because these parameters compare pairs of outcome categories.

We specified the second category (2 = academic) as our reference category; therefore, the first row of the table labelled General is comparing this category against the ‘Academic’ category. the second row of the table labelled Vocational is also comparing this category against the ‘Academic’ category.

Because we are just comparing two categories the interpretation is the same as for binary logistic regression:

```
## (Intercept) ses2 ses3 math science math:science
## general 3.641690e+02 0.6648794 0.3245067 0.8309198 1.0133243 1.001026
## vocational 7.426219e+09 2.3168692 0.5708861 0.6043085 0.7535371 1.006205
```

The relative log odds of being in general program versus in academic program will decrease by 1.125 if moving from the highest level of SES (SES = 3) to the lowest level of SES (SES = 1) , b = -1.125, Wald χ2(1) = -5.27, p <.001.

Exp(-1.1254491) = 0.3245067 means that when students move from the highest level of SES (SES = 3) to the lowest level of SES (1= SES) the odds ratio is 0.325 times as high and therefore students with the lowest level of SES tend to choose general program against academic program more than students with the highest level of SES.

The relative log odds of being in vocational program versus in academic program will decrease by 0.56 if moving from the highest level of SES (SES = 3) to the lowest level of SES (SES = 1) , b = -0.56, Wald χ2(1) = -2.82, p < 0.01.

Exp(-0.56) = 0.57 means that when students move from the highest level of SES (SES = 3) to the lowest level of SES (SES=1) the odds ratio is 0.57 times as high and therefore students with the lowest level of SES tend to choose vocational program against academic program more than students with the highest level of SES.

### 11.7.9 Interpretation of the Predictive Equation

Please check your slides for detailed information. You can find all the values on above R outcomes.

### 11.7.10 Build a classification table

```
# Load the summarytools package to use the classification function
library(summarytools)
# Build a classification table by using the ctable function
ctable <- table(hsb$prog2,predict(multi_mo))
ctable
```

```
##
## academic general vocational
## academic 88 4 13
## general 24 12 9
## vocational 21 4 25
```

## 11.8 Supplementary Learning Materials

Field, A (2013). Discovering statistics using IBM SPSS statistics (4th ed.). Los Angeles, CA: Sage Publications

Agresti, A. (1996). An introduction to categorical data analysis. New York, NY: Wiley & Sons.