第 10 章 R for Binary Choice Models
10.1 二元選擇模型
清除記憶體,並載入資料Spector_and_Mazzeo_1980.csv
Grade - binary variable indicating whether or not a student’s grade
improved. 1 indicates an improvement.
TUCE - Test score on economics test
PSI - participation in program
GPA - Student’s grade point average
rm(list=ls())
library(dplyr)
library(ggplot2)
data.set <- read.csv("https://bookdown.org/tpemartin/econometric_analysis/data/Spector_and_Mazzeo_1980.csv")
10.2 初步資料觀察
先看看有無參加PSI補救課程的人,成績是否改善的次數分配表。首先,將PSI與GRADE改成factor class。
將PSI與GRADE改成factor class
library(magrittr)
data.set$PSI %<>% factor()
data.set$GRADE %<>% factor()
次數分配
data.set %>% dplyr::select(PSI,GRADE) %>%
table() -> tb1; tb1
## GRADE
## PSI 0 1
## 0 15 3
## 1 6 8
增加一個代表加總的Column
tb1 %>% addmargins(margin=2)
## GRADE
## PSI 0 1 Sum
## 0 15 3 18
## 1 6 8 14
增加一個代表加總的Row
tb1 %>% addmargins(margin=1)
## GRADE
## PSI 0 1
## 0 15 3
## 1 6 8
## Sum 21 11
Row及Column加總都要
tb1 %>% addmargins(margin=c(1,2))
## GRADE
## PSI 0 1 Sum
## 0 15 3 18
## 1 6 8 14
## Sum 21 11 32
條件機率
有參加與沒參加PSI的人,成績有進步GRADE=1的比例。
tb1 %>% prop.table(margin=1) %>% # row 為條件狀況
addmargins(margin=2) # 增加一個 column margin
## GRADE
## PSI 0 1 Sum
## 0 0.8333 0.1667 1.0000
## 1 0.4286 0.5714 1.0000
成績有進步的人,有多少比例有參加PSI
tb1 %>% prop.table(margin=2) %>% # column 為條件狀況
addmargins(margin=1) # 增加一個 row margin
## GRADE
## PSI 0 1
## 0 0.7143 0.2727
## 1 0.2857 0.7273
## Sum 1.0000 1.0000
10.3 模型估計
\[\Pr(GRADE=1)=F(\beta_0+\beta_1 GPA+\beta_2 TUCE+\beta_3 PSI)\]
Logit模型
Logit<-glm(GRADE~GPA+TUCE+PSI,data=data.set,family=binomial(link='logit'))
Probit模型
Probit<-glm(GRADE~GPA+TUCE+PSI,data=data.set,family=binomial(link='probit'))
10.4 配適度
接下來我們會用到pscl與mfx套件來記算其他統計量
(McFadden)Pseudo-R2
library(pscl)
pR2(Logit)->R2.result; R2.result #計算Pseudo-R2
## llh llhNull G2 McFadden r2ML r2CU
## -12.8896 -20.5917 15.4042 0.3740 0.3821 0.5278
Pseudo-R2為0.374
pR2(Probit)
## llh llhNull G2 McFadden r2ML r2CU
## -12.8188 -20.5917 15.5459 0.3775 0.3848 0.5316
計算預測準確率
Null Model: 沒有帶解釋變數的模型
hitmiss(Logit) #計算預測準確率
## Classification Threshold = 0.5
## y=0 y=1
## yhat=0 18 3
## yhat=1 3 8
## Percent Correctly Predicted = 81.2%
## Percent Correctly Predicted = 85.7%, for y = 0
## Percent Correctly Predicted = 72.7% for y = 1
## Null Model Correctly Predicts 65.6%
## [1] 81.25 85.71 72.73
hitmiss(Probit)
## Classification Threshold = 0.5
## y=0 y=1
## yhat=0 18 3
## yhat=1 3 8
## Percent Correctly Predicted = 81.2%
## Percent Correctly Predicted = 85.7%, for y = 0
## Percent Correctly Predicted = 72.7% for y = 1
## Null Model Correctly Predicts 65.6%
## [1] 81.25 85.71 72.73
10.5 邊際效果
代表性個人
library(mfx)
logitmfx(Logit,data.set,atmean=TRUE)-> result.atmean; result.atmean #計算'代表性個人'邊際效果
## Call:
## logitmfx(formula = Logit, data = data.set, atmean = TRUE)
##
## Marginal Effects:
## dF/dx Std. Err. z P>|z|
## GPA 0.5339 0.2370 2.25 0.024 *
## TUCE 0.0180 0.0262 0.69 0.493
## PSI1 0.4565 0.1811 2.52 0.012 *
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## dF/dx is for discrete change for the following variables:
##
## [1] "PSI1"
平均性代表性個人,(1) 過去GPA每上升0.1時,其成績進步的可能性會上升0.0534; (2) 過去TUCE每上升1分時,其成績進步的可能性會上升0.018; (3) 有參與PSI比沒參與PSI的人,其成績進步的可能性會上升0.4565。
probitmfx(Probit,data.set)
## Call:
## probitmfx(formula = Probit, data = data.set)
##
## Marginal Effects:
## dF/dx Std. Err. z P>|z|
## GPA 0.5333 0.2267 2.35 0.0186 *
## TUCE 0.0170 0.0263 0.64 0.5191
## PSI1 0.4644 0.1713 2.71 0.0067 **
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## dF/dx is for discrete change for the following variables:
##
## [1] "PSI1"
全體邊際效果平均
logitmfx(Logit,data.set,atmean=FALSE) #計算全體邊際效果的平均
## Call:
## logitmfx(formula = Logit, data = data.set, atmean = FALSE)
##
## Marginal Effects:
## dF/dx Std. Err. z P>|z|
## GPA 0.3626 0.2400 1.51 0.131
## TUCE 0.0122 0.0190 0.64 0.521
## PSI1 0.3575 0.1420 2.52 0.012 *
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## dF/dx is for discrete change for the following variables:
##
## [1] "PSI1"
probitmfx(Probit,data.set,atmean=FALSE)
## Call:
## probitmfx(formula = Probit, data = data.set, atmean = FALSE)
##
## Marginal Effects:
## dF/dx Std. Err. z P>|z|
## GPA 0.3608 0.1112 3.24 0.0012 **
## TUCE 0.0115 0.0178 0.65 0.5182
## PSI1 0.3738 0.1420 2.63 0.0085 **
## ---
## Signif. codes:
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## dF/dx is for discrete change for the following variables:
##
## [1] "PSI1"