第 12 章 R for Multinomial Choice

12.1 多元可排序選擇模型(ordered)

資料:German_Health_Care_Utilization.csv

戴入使用套件MASS與資籵

library(MASS)
data.set2<-read.csv("https://raw.githubusercontent.com/tpemartin/github-data/master/German_Health_Care_Utilization.csv")

將應變數改成ordered格式

data.set2$hsat<-as.ordered(data.set2$hsat)

Ordered Logit估計

o.logit<-polr(hsat~age+hhninc+hhkids+educ+married+working,
     data.set2,method="logistic")
library(pscl)
pR2(o.logit)
hitmiss(o.logit)
summary(o.logit)

Ordered Probit估計

o.probit<-polr(hsat~age+married,
               data.set2,method='probit')
pR2(o.probit)
hitmiss(o.probit)
summary(o.probit)

12.2 多元不可排序

12.2.1 兩種常見資料格式

library(mlogit)
library(AER)

Wide shape

每一行是一筆個人資料,其中一個變數載明選擇結果。

data("Fishing", package = "mlogit")
head(Fishing)
Fish <- mlogit.data(Fishing, shape="wide", varying=2:9, choice="mode")
Fish

Long shape

每一行是一個人在一個選項下的選擇結果及相關特徵。

data("TravelMode", package = "AER")
head(TravelMode)
# long shape, the choice is recoceded in variable "choice", alternative variables is "mode"
TM <- mlogit.data(TravelMode, shape = "long", choice = "choice", alt.var = "mode")
TM

12.2.2 Formula

回顧三大區塊解釋變數:
  1. A: \(x_{ij}\) with constant coefficient: \(\beta (x_{ij}-x_{i1})\)
  2. B: \(z_i\) with alternative varying coefficient: \((\alpha_j-\alpha_1)+(\gamma_j-\gamma_1) z_i\)
  3. C: \(w_{ij}\) with alternative varying coefficient: \(\delta_j w_{ij}-\delta_j w_{i1}\).

Formula右邊: ~ A | B | C

以Travel Mode資料為例(TM)

# seven different forumua
f <- mFormula(choice ~ vcost | income + size | travel)
f2 <- mFormula(choice ~ vcost + travel | income + size)
f3 <- mFormula(choice ~ 0 | income)
f4 <- mFormula(choice ~ vcost + travel)
f5 <- mFormula(choice ~ vcost | 0 | travel)
f6 <- mFormula(choice ~ vcost | income + 0 | travel)
f7 <- mFormula(choice ~ 0 | income -1 | travel)

12.2.3 Multinomial logit

\[U_{ij}=V_{ij}+\epsilon_{ij}\] 以f6為例:

f6 <- mFormula(choice ~ vcost | income + 0 | travel)

\[V_{ij}-V_{i,air}=\beta (vcost_{ij}-vcost_{i,air})+(\gamma_j-\gamma_{air}) income_{i}+\delta_j travel_{ij}-\delta_{air} travel_{i,air}\]

以f7為例:

f7 <- mFormula(choice ~ 0 | income -1 | travel)

\[V_{ij}-V_{i,air}=(\gamma_j-\gamma_{air}) income_{i}+\delta_j travel_{ij}-\delta_{air} travel_{i,air}\]

ml.TM <- mlogit(f, TM)

coeftest(ml.TM)
## 
## t test of coefficients:
## 
##                   Estimate Std. Error t value Pr(>|t|)
## train:(intercept)  1.12690    0.93492    1.21   0.2295
## bus:(intercept)    0.25583    1.02157    0.25   0.8025
## car:(intercept)   -1.82840    0.89024   -2.05   0.0413
## vcost             -0.00440    0.00703   -0.63   0.5323
## train:income      -0.06397    0.01309   -4.89  2.1e-06
## bus:income        -0.04269    0.01381   -3.09   0.0023
## car:income        -0.01882    0.01165   -1.62   0.1079
## train:size         0.45588    0.25204    1.81   0.0720
## bus:size          -0.07620    0.35353   -0.22   0.8296
## car:size           0.77716    0.22826    3.40   0.0008
## air:travel        -0.03647    0.00681   -5.36  2.3e-07
## train:travel      -0.00742    0.00119   -6.23  2.8e-09
## bus:travel        -0.00669    0.00136   -4.93  1.8e-06
## car:travel        -0.00661    0.00115   -5.76  3.2e-08
##                      
## train:(intercept)    
## bus:(intercept)      
## car:(intercept)   *  
## vcost                
## train:income      ***
## bus:income        ** 
## car:income           
## train:size        .  
## bus:size             
## car:size          ***
## air:travel        ***
## train:travel      ***
## bus:travel        ***
## car:travel        ***
## ---
## Signif. codes:  
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

由於多一單位air通勤時間效用變化-0.0365, 而多花一塊錢會使效用變化-0.0044,故一單位air通勤時間等值於8.2915元。

有了點估計,我們還會想算它的standard error。

取出共變異矩陣

取出vcost與air:travel係數的variance-covariance matrix(V):

vcov(ml.TM)->vcov.TM #利用vcov()取出所有係數的共變異矩陣
vcov.TM[c("vcost","air:travel"),c("vcost","air:travel")]->V
#再取出其中屬於我們要的係數的共變異矩陣

Delta Method

\[\hat{\theta}\stackrel{a}{\sim}N(\theta_0,\textbf{V}),\]\[g(\hat{\theta})\stackrel{a}{\sim}N(g(\theta_0),g'(\theta_0)^T\textbf{V} g'(\theta_0))\] \(\stackrel{a}{\sim}\)代表大樣本下的逼近分配。

這裡

\[ \theta=\left[\begin{array}{c} \beta_{\text{vcost}}\\ \delta_{\text{air:travel}} \end{array}\right] \]\[ g(\theta)=\frac{\delta_{\text{air:travel}}}{\beta_{\text{vcost}}} \]

\[ g'(\theta)=\left[\begin{array}{c} \partial g/\partial\beta_{\text{vcost}}\\ \partial g/\partial\delta_{\text{air:travel}} \end{array}\right]=\left[\begin{array}{c} -\frac{\delta_{\text{air:travel}}}{\beta_{\text{vcost}}^{2}}\\ \frac{1}{\beta_{\text{vcost}}} \end{array}\right] \]

下面的dg即為\(g'(\hat{\theta})\)

delta_air<-ml.TM$coefficients["air:travel"]
beta<-ml.TM$coefficients["vcost"]
g<-delta_air/beta
dg<-matrix(c(-delta_air/(beta^2),1/beta),2,1)

使用Delta method計算Standard Error. t()為矩陣transpose的意思;而%*%是矩陣相乘的運算。

Vnew<-t(dg) %*% V %*% dg
SE_g<-sqrt(Vnew)
SE_g

Delta method介紹

Delta method套件msm::deltamethod()使用說明

12.2.4 Multinomial Probit

ml.TM_probit <- mlogit(f7, TM,
                       probit=TRUE)
summary(ml.TM_probit)

只分析部分選項(除air, 只留下train,bus,car)。

ml.TM_probit2 <- mlogit(f7, TM,
                       probit=TRUE,
                       alt.subset = c("train","bus","car"))
summary(ml.TM_probit2)

ml.TM_probit2$omega$train可查看共變異矩陣估計值: \[ var(\left[\begin{array}{c} \epsilon_{bus}-\epsilon_{train}\\ \epsilon_{car}-\epsilon_{train} \end{array}\right]) \]

ml.TM_probit2$omega$train