第 12 章 R for Multinomial Choice
12.1 多元可排序選擇模型(ordered)
資料:German_Health_Care_Utilization.csv
戴入使用套件MASS與資籵
library(MASS)
data.set2<-read.csv("https://raw.githubusercontent.com/tpemartin/github-data/master/German_Health_Care_Utilization.csv")
將應變數改成ordered格式
data.set2$hsat<-as.ordered(data.set2$hsat)
Ordered Logit估計
o.logit<-polr(hsat~age+hhninc+hhkids+educ+married+working,
data.set2,method="logistic")
library(pscl)
pR2(o.logit)
hitmiss(o.logit)
summary(o.logit)
Ordered Probit估計
o.probit<-polr(hsat~age+married,
data.set2,method='probit')
pR2(o.probit)
hitmiss(o.probit)
summary(o.probit)
12.2 多元不可排序
12.2.1 兩種常見資料格式
library(mlogit)
library(AER)
Wide shape
每一行是一筆個人資料,其中一個變數載明選擇結果。
data("Fishing", package = "mlogit")
head(Fishing)
Fish <- mlogit.data(Fishing, shape="wide", varying=2:9, choice="mode")
Fish
Long shape
每一行是一個人在一個選項下的選擇結果及相關特徵。
data("TravelMode", package = "AER")
head(TravelMode)
# long shape, the choice is recoceded in variable "choice", alternative variables is "mode"
TM <- mlogit.data(TravelMode, shape = "long", choice = "choice", alt.var = "mode")
TM
12.2.2 Formula
回顧三大區塊解釋變數:- A: xij with constant coefficient: β(xij−xi1)
- B: zi with alternative varying coefficient: (αj−α1)+(γj−γ1)zi
- C: wij with alternative varying coefficient: δjwij−δjwi1.
Formula右邊:
~ A | B | C
以Travel Mode資料為例(TM)
# seven different forumua
f <- mFormula(choice ~ vcost | income + size | travel)
f2 <- mFormula(choice ~ vcost + travel | income + size)
f3 <- mFormula(choice ~ 0 | income)
f4 <- mFormula(choice ~ vcost + travel)
f5 <- mFormula(choice ~ vcost | 0 | travel)
f6 <- mFormula(choice ~ vcost | income + 0 | travel)
f7 <- mFormula(choice ~ 0 | income -1 | travel)
12.2.3 Multinomial logit
Uij=Vij+ϵij 以f6為例:
f6 <- mFormula(choice ~ vcost | income + 0 | travel)
Vij−Vi,air=β(vcostij−vcosti,air)+(γj−γair)incomei+δjtravelij−δairtraveli,air
以f7為例:
f7 <- mFormula(choice ~ 0 | income -1 | travel)
Vij−Vi,air=(γj−γair)incomei+δjtravelij−δairtraveli,air
ml.TM <- mlogit(f, TM)
coeftest(ml.TM)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## train:(intercept) 1.12690 0.93492 1.21 0.2295
## bus:(intercept) 0.25583 1.02157 0.25 0.8025
## car:(intercept) -1.82840 0.89024 -2.05 0.0413
## vcost -0.00440 0.00703 -0.63 0.5323
## train:income -0.06397 0.01309 -4.89 2.1e-06
## bus:income -0.04269 0.01381 -3.09 0.0023
## car:income -0.01882 0.01165 -1.62 0.1079
## train:size 0.45588 0.25204 1.81 0.0720
## bus:size -0.07620 0.35353 -0.22 0.8296
## car:size 0.77716 0.22826 3.40 0.0008
## air:travel -0.03647 0.00681 -5.36 2.3e-07
## train:travel -0.00742 0.00119 -6.23 2.8e-09
## bus:travel -0.00669 0.00136 -4.93 1.8e-06
## car:travel -0.00661 0.00115 -5.76 3.2e-08
##
## train:(intercept)
## bus:(intercept)
## car:(intercept) *
## vcost
## train:income ***
## bus:income **
## car:income
## train:size .
## bus:size
## car:size ***
## air:travel ***
## train:travel ***
## bus:travel ***
## car:travel ***
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
由於多一單位air通勤時間效用變化-0.0365, 而多花一塊錢會使效用變化-0.0044,故一單位air通勤時間等值於8.2915元。
有了點估計,我們還會想算它的standard error。
取出共變異矩陣
取出vcost與air:travel係數的variance-covariance matrix(V):
vcov(ml.TM)->vcov.TM #利用vcov()取出所有係數的共變異矩陣
vcov.TM[c("vcost","air:travel"),c("vcost","air:travel")]->V
#再取出其中屬於我們要的係數的共變異矩陣
Delta Method
若 ˆθa∼N(θ0,V), 則 g(ˆθ)a∼N(g(θ0),g′(θ0)TVg′(θ0)) a∼代表大樣本下的逼近分配。
這裡
θ=[βvcostδair:travel] 而 g(θ)=δair:travelβvcost
故 g′(θ)=[∂g/∂βvcost∂g/∂δair:travel]=[−δair:travelβ2vcost1βvcost]
下面的dg
即為g′(ˆθ)
delta_air<-ml.TM$coefficients["air:travel"]
beta<-ml.TM$coefficients["vcost"]
g<-delta_air/beta
dg<-matrix(c(-delta_air/(beta^2),1/beta),2,1)
使用Delta method計算Standard Error.
t()
為矩陣transpose的意思;而%*%
是矩陣相乘的運算。
Vnew<-t(dg) %*% V %*% dg
SE_g<-sqrt(Vnew)
SE_g
12.2.4 Multinomial Probit
ml.TM_probit <- mlogit(f7, TM,
probit=TRUE)
summary(ml.TM_probit)
只分析部分選項(除air, 只留下train,bus,car)。
ml.TM_probit2 <- mlogit(f7, TM,
probit=TRUE,
alt.subset = c("train","bus","car"))
summary(ml.TM_probit2)
ml.TM_probit2$omega$train
可查看共變異矩陣估計值:
var([ϵbus−ϵtrainϵcar−ϵtrain])
ml.TM_probit2$omega$train