第 12 章 R for Multinomial Choice
12.1 多元可排序選擇模型(ordered)
資料:German_Health_Care_Utilization.csv
戴入使用套件MASS與資籵
library(MASS)
data.set2<-read.csv("https://raw.githubusercontent.com/tpemartin/github-data/master/German_Health_Care_Utilization.csv")
將應變數改成ordered格式
data.set2$hsat<-as.ordered(data.set2$hsat)
Ordered Logit估計
o.logit<-polr(hsat~age+hhninc+hhkids+educ+married+working,
data.set2,method="logistic")
library(pscl)
pR2(o.logit)
hitmiss(o.logit)
summary(o.logit)
Ordered Probit估計
o.probit<-polr(hsat~age+married,
data.set2,method='probit')
pR2(o.probit)
hitmiss(o.probit)
summary(o.probit)
12.2 多元不可排序
12.2.1 兩種常見資料格式
library(mlogit)
library(AER)
Wide shape
每一行是一筆個人資料,其中一個變數載明選擇結果。
data("Fishing", package = "mlogit")
head(Fishing)
Fish <- mlogit.data(Fishing, shape="wide", varying=2:9, choice="mode")
Fish
Long shape
每一行是一個人在一個選項下的選擇結果及相關特徵。
data("TravelMode", package = "AER")
head(TravelMode)
# long shape, the choice is recoceded in variable "choice", alternative variables is "mode"
TM <- mlogit.data(TravelMode, shape = "long", choice = "choice", alt.var = "mode")
TM
12.2.2 Formula
回顧三大區塊解釋變數:- A: \(x_{ij}\) with constant coefficient: \(\beta (x_{ij}-x_{i1})\)
- B: \(z_i\) with alternative varying coefficient: \((\alpha_j-\alpha_1)+(\gamma_j-\gamma_1) z_i\)
- C: \(w_{ij}\) with alternative varying coefficient: \(\delta_j w_{ij}-\delta_j w_{i1}\).
Formula右邊:
~ A | B | C
以Travel Mode資料為例(TM)
# seven different forumua
f <- mFormula(choice ~ vcost | income + size | travel)
f2 <- mFormula(choice ~ vcost + travel | income + size)
f3 <- mFormula(choice ~ 0 | income)
f4 <- mFormula(choice ~ vcost + travel)
f5 <- mFormula(choice ~ vcost | 0 | travel)
f6 <- mFormula(choice ~ vcost | income + 0 | travel)
f7 <- mFormula(choice ~ 0 | income -1 | travel)
12.2.3 Multinomial logit
\[U_{ij}=V_{ij}+\epsilon_{ij}\] 以f6為例:
f6 <- mFormula(choice ~ vcost | income + 0 | travel)
\[V_{ij}-V_{i,air}=\beta (vcost_{ij}-vcost_{i,air})+(\gamma_j-\gamma_{air}) income_{i}+\delta_j travel_{ij}-\delta_{air} travel_{i,air}\]
以f7為例:
f7 <- mFormula(choice ~ 0 | income -1 | travel)
\[V_{ij}-V_{i,air}=(\gamma_j-\gamma_{air}) income_{i}+\delta_j travel_{ij}-\delta_{air} travel_{i,air}\]
ml.TM <- mlogit(f, TM)
coeftest(ml.TM)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## train:(intercept) 1.12690 0.93492 1.21 0.2295
## bus:(intercept) 0.25583 1.02157 0.25 0.8025
## car:(intercept) -1.82840 0.89024 -2.05 0.0413
## vcost -0.00440 0.00703 -0.63 0.5323
## train:income -0.06397 0.01309 -4.89 2.1e-06
## bus:income -0.04269 0.01381 -3.09 0.0023
## car:income -0.01882 0.01165 -1.62 0.1079
## train:size 0.45588 0.25204 1.81 0.0720
## bus:size -0.07620 0.35353 -0.22 0.8296
## car:size 0.77716 0.22826 3.40 0.0008
## air:travel -0.03647 0.00681 -5.36 2.3e-07
## train:travel -0.00742 0.00119 -6.23 2.8e-09
## bus:travel -0.00669 0.00136 -4.93 1.8e-06
## car:travel -0.00661 0.00115 -5.76 3.2e-08
##
## train:(intercept)
## bus:(intercept)
## car:(intercept) *
## vcost
## train:income ***
## bus:income **
## car:income
## train:size .
## bus:size
## car:size ***
## air:travel ***
## train:travel ***
## bus:travel ***
## car:travel ***
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
由於多一單位air通勤時間效用變化-0.0365, 而多花一塊錢會使效用變化-0.0044,故一單位air通勤時間等值於8.2915元。
有了點估計,我們還會想算它的standard error。
取出共變異矩陣
取出vcost與air:travel係數的variance-covariance matrix(V):
vcov(ml.TM)->vcov.TM #利用vcov()取出所有係數的共變異矩陣
vcov.TM[c("vcost","air:travel"),c("vcost","air:travel")]->V
#再取出其中屬於我們要的係數的共變異矩陣
Delta Method
若 \[\hat{\theta}\stackrel{a}{\sim}N(\theta_0,\textbf{V}),\] 則 \[g(\hat{\theta})\stackrel{a}{\sim}N(g(\theta_0),g'(\theta_0)^T\textbf{V} g'(\theta_0))\] \(\stackrel{a}{\sim}\)代表大樣本下的逼近分配。
這裡
\[ \theta=\left[\begin{array}{c} \beta_{\text{vcost}}\\ \delta_{\text{air:travel}} \end{array}\right] \] 而 \[ g(\theta)=\frac{\delta_{\text{air:travel}}}{\beta_{\text{vcost}}} \]
故 \[ g'(\theta)=\left[\begin{array}{c} \partial g/\partial\beta_{\text{vcost}}\\ \partial g/\partial\delta_{\text{air:travel}} \end{array}\right]=\left[\begin{array}{c} -\frac{\delta_{\text{air:travel}}}{\beta_{\text{vcost}}^{2}}\\ \frac{1}{\beta_{\text{vcost}}} \end{array}\right] \]
下面的dg
即為\(g'(\hat{\theta})\)
delta_air<-ml.TM$coefficients["air:travel"]
beta<-ml.TM$coefficients["vcost"]
g<-delta_air/beta
dg<-matrix(c(-delta_air/(beta^2),1/beta),2,1)
使用Delta method計算Standard Error.
t()
為矩陣transpose的意思;而%*%
是矩陣相乘的運算。
Vnew<-t(dg) %*% V %*% dg
SE_g<-sqrt(Vnew)
SE_g
12.2.4 Multinomial Probit
ml.TM_probit <- mlogit(f7, TM,
probit=TRUE)
summary(ml.TM_probit)
只分析部分選項(除air, 只留下train,bus,car)。
ml.TM_probit2 <- mlogit(f7, TM,
probit=TRUE,
alt.subset = c("train","bus","car"))
summary(ml.TM_probit2)
ml.TM_probit2$omega$train
可查看共變異矩陣估計值:
\[
var(\left[\begin{array}{c}
\epsilon_{bus}-\epsilon_{train}\\
\epsilon_{car}-\epsilon_{train}
\end{array}\right])
\]
ml.TM_probit2$omega$train