第 11 章 Multinomial choice model

11.1 Ordered choice(可排序選擇)

  • 0-10分自我衡量健康滿意度

  • 1-5分課程評量

There are J+1 options from 0 to J. An individual’s choice depends on the latent variable Y_i^* such that \[Y_{i}^*=X_{i}^{'}\beta+\epsilon_{i},\] where \(\epsilon\) has a pdf \(f(.)\) and a CDF \(F(.)\).

The larger the \(Y_i^*\) the higher option number that he will choose. There must be J thresholds \[\mu_{0}<...<\mu_{J-1}\] such that the option of person \(i\): \[ Y_{i}=\left\{ \begin{array}{ccc} 0 & if & Y_i^*<\mu_{0}\\ k & if & \mu_{k-1}\leq Y_i^*<\mu_{k}\\ J & if & \mu_{J-1}\leq Y_i^* \end{array}.\right.\] It follows that \[\begin{align} Pr(Y_{i} =y_{i})=\begin{cases} Pr(X_{i}^{'}\beta+\epsilon_{i}<\mu_{0})=F(\mu_{0}-X_{i}^{'}\beta), & y_{i}=0\\ Pr(\mu_{k-1}\leq X_{i}^{'}\beta+\epsilon_{i}<\mu_{k})=F(\mu_{k}-X_{i}^{'}\beta)-F(\mu_{k-1}-X_{i}^{'}\beta), & 1\leq y_{i}=k\leq J-1\\ Pr(X_{i}^{'}\beta+\epsilon_{i}\geq\mu_{J-1})=1-F(\mu_{J-1}-X_{i}^{'}\beta), & y_{i}=J \end{cases} \end{align}\]

Goodness-of-Fit

  1. Pseudo-\(R^{2}\).

  2. Prediction: Predicted choice \[\hat{y}_{i}=\arg\max_{\{y_{i}=0,1,...,J\}}\Pr(Y_{i}=y_{i}|\hat{\beta},\hat{\mu}).\] And compute the percentage of correct prediction.

若樣本數有500個,\(y\in\{1,2,3\}\), 其中\(y=1\)的有30個, \(y=2\)有300個,\(y=3\)有170個,請問在Probit和Logit模型下,\(\ln L_0\)為多少?

概似函數

The likelihood function and MLE

Since \(Y_{i}\) is discrete, \(L(Y_{i})=\Pr(Y_{i}=y_{i})\),where \(\Pr(Y_{i}=y_{i})\) is defined by ([eq:orderPr]). The sample log-likelihood function

\[\ln L=\sum\ln L(Y_{i})=\sum\ln\Pr(Y_{i}=y_{i}).\]

The MLEs is to solve for \(\{\beta,\mu_{0},\cdots,\mu_{j-1}\}\).

寫下明確概似函數定義。

Marginal Effect

If \(X_{i}\) is continuous :

• Before (two options) :

\[\Pr(Y_{i}=0)+\Pr(Y_{i}=1)=1.\]

\[\frac{\partial Pr(Y_{i}=1)}{\partial X_{i}}+\frac{\partial Pr(Y_{i}=0)}{\partial X_{i}}=0.\]

• Now (multiple ordered options):

\[\Pr(Y_{i}=0)+\dots+\Pr(Y_{i}=J)=1.\]

11.2 Unordered choice(不可排序選擇)

假設有A,B,C三個選擇,同一層次3選1。

11.2.1 Random Utility

個人\(i\)選擇\(j\)的效用:

\[ U_{ij}=V_{ij}+\epsilon_{ij}, \] 其中\(V_{ij}\)為可被解釋的部份, \(\epsilon_{ij}\)為殘差項。

11.2.2 Multinomial Logit Model

假設

  1. \(\epsilon\sim\text{Gumbel distribution}\)

    \[\begin{align} f(\epsilon) & = e^{-\epsilon}e^{-e^{-\epsilon}},\\ F(\epsilon) & = e^{-e^{-\epsilon}} \end{align}\]

  2. 不同選項間的殘差互相獨立,即\(\epsilon_{ij}\perp\epsilon_{ij'}\)\(j\neq j'\)

以三個選項{A,B,C}為例,我們可以證明 \[\begin{eqnarray*} \Pr(\text{choice =}A) & = & \Pr(U_A>U_B,U_A>U_C)\\ & = & \frac{1}{1+\exp(V_{B}-V_A)+\exp(V_{C}-V_A)}. \end{eqnarray*}\]

11.2.3 Identification

For \(U=V+\epsilon\), \(V\) consists of all the regressors \({\bf X}\) so that \(V={\bf X}\beta\). However, not all \(\beta\) can be estimated (or identified more specifically) since we can only infer the difference of \(V\) between options. Consider the following \(V\) setup :

\[V_{ij}=\alpha_j+\beta x_{ij}+\gamma_j z_i+\delta_j w_{ij}+\tau q_{i}.\]

To what extend can we estimate those parameters?

  1. only the \(\alpha_j-\alpha_{k}\) can be estimated, but their not separate levels.
  2. \(\beta\) can be estimated.
  3. only the \(\gamma_j-\gamma_k\) can be estimated, but their not separate levels.
  4. all \(\delta_j\)s can be estimated.
  5. \(\tau\) can not be estimated.
\(V_{ij}-V_{i1}\)為例,最後我們只會有 \[V_{ij}-V_{i1}=(\alpha_j-\alpha_1)+\beta (x_{ij}-x_{i1})+(\gamma_j-\gamma_1) z_i+\delta_j w_{ij}-\delta_j w_{i1}.\] 可以分成三大區塊:
  1. \(x_{ij}\) with constant coefficient: \(\beta (x_{ij}-x_{i1})\)
  2. \(z_i\) with alternative varying coefficient: \((\alpha_j-\alpha_1)+(\gamma_j-\gamma_1) z_i\)
  3. \(w_{ij}\) with alternative varying coefficient: \(\delta_j w_{ij}-\delta_j w_{i1}\).

11.2.4 Multinomial Probit

假設 \[ \left[\begin{array}{c} \epsilon_{i0}\\ \epsilon_{i1}\\ \vdots\\ \epsilon_{iJ} \end{array}\right]\sim N(0,\Sigma) \]

Multinomial Probit比起Multinomial Logit還多了選項間的variance-covariance matrix得估算。

以A,B,C三選項為例,任何選擇只會透露兩兩效用比較結果,我們可以只看\(U_B-U_A,U_C-U_A\)

說明不論選擇結果為何,\(U_B-U_A\)\(U_C-U_A\)兩個隨機變數即足夠表示所有對應的訊息。

\[\begin{align} U_B-U_A &= V_B-V_A+(\epsilon_B-\epsilon_A)\\ U_C-U_A &= V_C-V_A+(\epsilon_C-\epsilon_A) \end{align}\]

因為\(\epsilon\)的常態假設,故

\[ \left[\begin{array}{c} \epsilon_{A}-\epsilon_B\\ \epsilon_{A}-\epsilon_C\\ \end{array}\right]\sim N(0,\tilde{\Sigma}) \]

也是常態分配。

效用函數同乘\(\alpha>0\)倍不會改變選擇結果,也就是說\(U_{j}\)\(U'_{j}\)若有此倍數關係,它們顯示的選擇會相同。

\(\Theta\)代表\(V_A-V_B,V_A-V_C\)裡的參數,故模型的概似函數可寫成\(L(\Theta,\tilde{\Sigma})\),請說明\((\Theta,\tilde{\Sigma})=(\Theta_0,\tilde{\Sigma}_0)\)\((\Theta,\tilde{\Sigma})=(\alpha\Theta_0,\alpha^2\tilde{\Sigma}_0)\)會有相同的概似函數值。

上面論述表示,若不進一步限制模型參數空間,最大概似估計會有無窮多組解,也就是認定不足(under-identifying)的現象。

Multinomial Probit在估算時,除了和Multinomial Logit一樣要有一個選項為比較選項外,必需選擇一個\(\alpha\)值來滿足認定條件。一般是選\(\alpha=1/\sigma(\epsilon_B-\epsilon_A)\)使得\(\tilde{\Sigma}\)對角線第一個variance值: \[\tilde{\Sigma}_{11}=1\]