Chapter 16 Models with Multiple Choices
Examples of multinomial choice (polytomous) situations:
Choice of a laundry detergent: Tide, Cheer, Arm & Hammer, Wisk, etc.
Choice of a major: economics, marketing, management, finance or accounting.
Choices after graduating from high school: not going to college, going to a private 4-year college, a public 4 year-college, or a 2-year college.
Firms also have such multinomial choices 1. In which country to operate 2. Where to locate a store 3. Which CEO to hire
16.1 Multinomial Logit
The explanatory variable \(x_i\) is individual specific, but does not change across alternatives. Example age of the individual.
The dependent variable is nominal
There are more than 2 choices
There is no meaningful ordering to them.
- Otherwise we would want to use that information (with an ordered probit or ordered logit)
Recall, the logit probability is the case of two choices.
\[P_i=\frac{exp(\beta_{0i}+\beta_{1i}X_i)}{exp(\beta_{0i}+\beta_{1i}X_i)+ exp(\beta_{0k}+\beta_{1k}X_k)}\] The probability that y is equal to choice \(i\).
\[P_i=\frac{exp(\beta_{0i}+\beta_{1i}X_i)}{\sum_{k=1}^{K} exp(\beta_{0k}+\beta_{1k}X_k)}\] The probability that y is equal to choice \(j\).
\[P_j=\frac{exp(\beta_{0j}+\beta_{1j}X_j)}{\sum_{k=1}^{K} exp(\beta_{0k}+\beta_{1k}X_k)}\] Relative probability choices \[P_i/P_j = \frac{exp(\beta_{0i}+\beta_{1i}X_i)}{exp(\beta_{0j}+\beta_{1j}X_j)}\]
Relative Probabilities
We can only identify relative probabilities for each choice.
Similar to our discussion of dummy variables, we need to model our choices as relative to a base.
We set the base by forcing one of the choices to have \(\beta\)’s equal to zero.
If we do this for choice \(j\), then the relative probabilities can be expressed as \[P_i/P_j = exp(\beta_{0i}+\beta_{1i}X_i)\]
16.2 IIA Property
- There is the implicit assumption in logit models that the odds between any pair of alternatives is independent of irrelevant alternatives (IIA)
One way to state the assumption
If choice A is preferred to choice B out of the choice set {A,B}, then introducing a third alternative X, thus expanding that choice set to {A,B,X}, must not make B preferable to A.
which kind of makes sense.
There is the implicit assumption in logit models that the odds between any pair of alternatives is independent of irrelevant alternatives (IIA)
In the case of the multinomial logit model, the IIA implies that adding another alternative or changing the characteristics of a third alternative must not affect the relative odds between the two alternatives considered.
This is not realistic for many real life applications involving similar (substitute) alternatives.
16.2.1 Red Bus/Blue Bus Paradox (McFadden 1974).
Imagine commuters first face a decision between two modes of transportation: cars and red bus
Suppose that a consumer chooses between these two options with equal probability, 0.5, so that the odds ratio equals 1.
Now add a third mode, blue bus. Assuming bus commuters do not care about the color of the bus (they are perfect substitutes), consumers are expected to choose between bus and car still with equal probability, so the probability of car is still 0.5, while the probabilities of each of the two bus types should go down to 0.25
However, this violates IIA: for the odds ratio between car and red bus to be preserved, the new probabilities must be: car 0.33; red bus 0.33; blue bus 0.33
The IIA axiom does not mix well with perfect substitutes.
16.3 Alternatives to Multionomial Logit
The advantage of Multinomial Logit (and Logit for that matter) is that the probabilities have a closed form solution (i.e. a simple equation)
An alternative is to use multinomial probit.
Advantage: NO IIA property!
Disadvantage: Computationally intensive once the number of choices is greater than 3.
16.4 Multinomial Logit Example
A relatively common R function that fits multinomial logit models is multinom from package nnet.
Let us use the dataset nels_small for an example of how multinom works.
The variable grades in this dataset is an index, with best grades represented by lower values of grade .
We try to explain the choice of a secondary institution ( psechoice ) only by the high school grade.
The variable pschoice can take one of three values:
Variable | Code |
---|---|
psechoice=1 | no college, |
psechoice=2 | two year college |
psechoice=3 | four year college |
R Code for Multinomial Logit
stargazer::stargazer(nels.multinom, type="html", dep.var.labels.include = FALSE, column.labels = c("2 year college","4 year college"))
Dependent variable: | ||
2 year college | 4 year college | |
(1) | (2) | |
grades | -0.309^{***} | -0.706^{***} |
(0.052) | (0.053) | |
Constant | 2.505^{***} | 5.770^{***} |
(0.418) | (0.404) | |
Akaike Inf. Crit. | 1,758.626 | 1,758.626 |
Note: | ^{}p<0.1; ^{}p<0.05; ^{}p<0.01 |
Interpreting Output
The output from the multinom function gives coefficient estimates for each level of the response variable psechoice , except for the first level, which is the benchmark.
We treat the dependent choice variables similar to dummy variables. If there are G options, then we can only identify parameters associated with G-1 options.
As in the probit and logit model, we can only identify relative differences.
That is, how much the probability of choosing option A over B increases or decreases
Making Predictions Suppose we wanted to know the probabilities for the median student and for the student in the top 5%.
medGrades <- median(nels_small$grades)
fifthPercentileGrades <- quantile(nels_small$grades, .05)
newdat <- data.frame(grades=c(medGrades, fifthPercentileGrades))
pred <- predict(nels.multinom, newdat, "probs")
knitr::kable(pred, digits = 2)
1 | 2 | 3 | |
---|---|---|---|
0.18 | 0.29 | 0.53 | |
5% | 0.02 | 0.10 | 0.89 |
We can clearly see the high performing student is more likely to attend college.
16.5 Conditional Logit
Difference between Conditional and Multinomial Logit
Multinomial logit models a choice as a function of the chooser’s characteristics, whereas conditional logit models the choice as a function of the choice’s characteristics.
It’s really that simple! Note that the two can be combined.
Conditional Logit is a special case of Multinomial Logit
Multinomial logit \[U_{ij}=X_{i}\beta_j+e_{ij}\] Conditional logit \[U_{ij}=X_{ij}\beta+e_{ij}\]
Notice, the difference in the subscripts. Conditional logit does not estimate different \(\beta\)’s for each choice. Instead, there is one set of parameters, but the characteristics \(X\) changes with each product.
Characteristics CANNOT vary only by individual. If they do then they will fall out.
\[Pr(Y_i=j)=\frac{exp(\beta_1 X_{ij}+\beta_2 Z_i)}{\sum_{k=1}^{K}exp(\beta_1 X_{ik}+\beta_2 Z_i) }\]
You can never estimate \(\beta_2\) in this case.
Proof
\[\begin{align*} Pr(Y_i=j) &=\frac{exp(\beta_1 X_{ij}+\beta_2 Z_i)}{\sum_{k=1}^{K}exp(\beta_1 X_{ik}+\beta_2 Z_i) } \\ \\ &=\frac{exp(\beta_1 X_{ij})exp(\beta_2 Z_i)}{\sum_{k=1}^{K}exp(\beta_1 X_{ik})exp(\beta_2 Z_i) } \\ \\ &=\frac{exp(\beta_1 X_{ij})exp(\beta_2 Z_i)}{exp(\beta_2 Z_i) \sum_{k=1}^{K}exp(\beta_1 X_{ik}) } \\ \\ &=\frac{exp(\beta_1 X_{ij})}{ \sum_{k=1}^{K}exp(\beta_1 X_{ik}) }\end{align*}\]
Fixed Effect Logit
The conditional logit model can effectively removes certain fixed effects.
Again, consider our previous example. \[\begin{align*}Pr(Y_i=j) &=\frac{exp(\beta_1 X_{ij}+\beta_2 Z_i)}{\sum_{k=1}^{K}exp(\beta_1 X_{ik}+\beta_2 Z_i) } \\ \\ &=\frac{exp(\beta_1 X_{ij}+\alpha_i)}{\sum_{k=1}^{K}exp(\beta_1 X_{ik}+\alpha_i) }\end{align*}\]
Warning: As N grows large, the maximum likelihood routine is not well defined. This problem is called the incidental parameter problem and present in all non-linear models. In R, you can use the package bife to reduce this problem.
16.6 Mixed Logit
A mixed logit combines Conditional logit and multinomial logit
If you allow for random coefficients (unobserved heterogeneity also known as random effects), then the mixed logit model can overcome the IIA property.
16.7 Conditional/Multinomial/Mixed Logit in R
We will use the mlogit package in R
library(mlogit)
data("Fishing", package = "mlogit")
Fish <- mlogit.data(Fishing, varying = c(2:9), shape = "wide", choice = "mode")
## a pure "conditional" model
m1<-mlogit(mode ~ price + catch, data = Fish)
## a pure "multinomial model"
m2<-mlogit(mode ~ 0 | income, data = Fish)
## which can also be estimated using multinom (package nnet)
#library("nnet")
#summary(multinom(mode ~ income, data = Fishing))
## a "mixed" model
m <- mlogit(mode ~ price+ catch | income, data = Fish)
stargazer::stargazer(m1,m2,m,type="html", dep.var.labels.include = FALSE ,column.labels = c("Conditional","Multinomial","Mixed"), font.size = "tiny", single.row = TRUE)
Examples of Mixed Logit in R
Dependent variable: | |||
Conditional | Multinomial | Mixed | |
(1) | (2) | (3) | |
(Intercept):boat | 0.871^{***} (0.114) | 0.739^{***} (0.197) | 0.527^{**} (0.223) |
(Intercept):charter | 1.499^{***} (0.133) | 1.341^{***} (0.195) | 1.694^{***} (0.224) |
(Intercept):pier | 0.307^{***} (0.115) | 0.814^{***} (0.229) | 0.778^{***} (0.220) |
price | -0.025^{***} (0.002) | -0.025^{***} (0.002) | |
catch | 0.377^{***} (0.110) | 0.358^{***} (0.110) | |
income:boat | 0.0001^{**} (0.00004) | 0.0001^{*} (0.0001) | |
income:charter | -0.00003 (0.00004) | -0.00003 (0.0001) | |
income:pier | -0.0001^{***} (0.0001) | -0.0001^{**} (0.0001) | |
Observations | 1,182 | 1,182 | 1,182 |
R^{2} | 0.178 | 0.014 | 0.189 |
Log Likelihood | -1,230.784 | -1,477.151 | -1,215.138 |
LR Test | 533.878^{***} (df = 5) | 41.145^{***} (df = 6) | 565.171^{***} (df = 8) |
Note: | ^{}p<0.1; ^{}p<0.05; ^{}p<0.01 |
16.8 Summary
- Multinomial choice models are used when the dependent variable represents a choice between several options
- Multinomial probit and multinomial logit are the most popular multinomial choice models
- If choice is between J + 1 options, both will have J equations
- Conditional logit is a special case of multinomial logit with only 1 equation
- Maximum likelihood estimation is the most common way of estimating all these models
- Explanatory variables can be characteristics of the options or the individuals (or both) and can lead to different models (so pay careful attention
- Care must be taken with interpretation of coefficients/marginal effects