第 10 章 R for Binary Choice Models

10.1 二元選擇模型

清除記憶體，並載入資料Spector_and_Mazzeo_1980.csv

Grade - binary variable indicating whether or not a student’s grade improved. 1 indicates an improvement.
TUCE - Test score on economics test
PSI - participation in program
GPA - Student’s grade point average

rm(list=ls())
library(dplyr)
library(ggplot2)

data.set <- read.csv("https://bookdown.org/tpemartin/econometric_analysis/data/Spector_and_Mazzeo_1980.csv")

10.2 初步資料觀察

先看看有無參加PSI補救課程的人，成績是否改善的次數分配表。首先，將PSI與GRADE改成factor class。

將PSI與GRADE改成factor class

library(magrittr)
data.set$PSI %<>% factor()
data.set$GRADE %<>% factor()

次數分配

data.set %>%  dplyr::select(PSI,GRADE) %>%
  table() -> tb1; tb1

##    GRADE
## PSI  0  1
##   0 15  3
##   1  6  8

增加一個代表加總的Column

tb1 %>% addmargins(margin=2)

##    GRADE
## PSI  0  1 Sum
##   0 15  3  18
##   1  6  8  14

增加一個代表加總的Row

tb1 %>% addmargins(margin=1)

##      GRADE
## PSI    0  1
##   0   15  3
##   1    6  8
##   Sum 21 11

Row及Column加總都要

tb1 %>% addmargins(margin=c(1,2))

##      GRADE
## PSI    0  1 Sum
##   0   15  3  18
##   1    6  8  14
##   Sum 21 11  32

條件機率

有參加與沒參加PSI的人，成績有進步GRADE=1的比例。

tb1 %>% prop.table(margin=1) %>% # row 為條件狀況 
  addmargins(margin=2) # 增加一個 column margin

##    GRADE
## PSI      0      1    Sum
##   0 0.8333 0.1667 1.0000
##   1 0.4286 0.5714 1.0000

成績有進步的人，有多少比例有參加PSI

tb1 %>% prop.table(margin=2) %>% # column 為條件狀況
  addmargins(margin=1) # 增加一個 row margin

##      GRADE
## PSI        0      1
##   0   0.7143 0.2727
##   1   0.2857 0.7273
##   Sum 1.0000 1.0000

10.3 模型估計

$\Pr(GRADE=1)=F(\beta_0+\beta_1 GPA+\beta_2 TUCE+\beta_3 PSI)$

Logit模型

Logit<-glm(GRADE~GPA+TUCE+PSI,data=data.set,family=binomial(link='logit'))

Probit模型

Probit<-glm(GRADE~GPA+TUCE+PSI,data=data.set,family=binomial(link='probit'))

10.4 配適度

接下來我們會用到pscl與mfx套件來記算其他統計量

(McFadden)Pseudo-R2

library(pscl)
pR2(Logit)->R2.result; R2.result #計算Pseudo-R2

##      llh  llhNull       G2 McFadden     r2ML     r2CU 
## -12.8896 -20.5917  15.4042   0.3740   0.3821   0.5278

Pseudo-R2為0.374

pR2(Probit)

##      llh  llhNull       G2 McFadden     r2ML     r2CU 
## -12.8188 -20.5917  15.5459   0.3775   0.3848   0.5316

計算預測準確率

Null Model: 沒有帶解釋變數的模型

hitmiss(Logit) #計算預測準確率

## Classification Threshold = 0.5 
##        y=0 y=1
## yhat=0  18   3
## yhat=1   3   8
## Percent Correctly Predicted = 81.2%
## Percent Correctly Predicted = 85.7%, for y = 0
## Percent Correctly Predicted = 72.7%  for y = 1
## Null Model Correctly Predicts 65.6%

## [1] 81.25 85.71 72.73

hitmiss(Probit)

## Classification Threshold = 0.5 
##        y=0 y=1
## yhat=0  18   3
## yhat=1   3   8
## Percent Correctly Predicted = 81.2%
## Percent Correctly Predicted = 85.7%, for y = 0
## Percent Correctly Predicted = 72.7%  for y = 1
## Null Model Correctly Predicts 65.6%

## [1] 81.25 85.71 72.73

10.5 邊際效果

代表性個人

library(mfx)
logitmfx(Logit,data.set,atmean=TRUE)-> result.atmean; result.atmean #計算'代表性個人'邊際效果

## Call:
## logitmfx(formula = Logit, data = data.set, atmean = TRUE)
## 
## Marginal Effects:
##       dF/dx Std. Err.    z P>|z|  
## GPA  0.5339    0.2370 2.25 0.024 *
## TUCE 0.0180    0.0262 0.69 0.493  
## PSI1 0.4565    0.1811 2.52 0.012 *
## ---
## Signif. codes:  
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## dF/dx is for discrete change for the following variables:
## 
## [1] "PSI1"

平均性代表性個人，(1) 過去GPA每上升0.1時，其成績進步的可能性會上升0.0534; (2) 過去TUCE每上升1分時，其成績進步的可能性會上升0.018； (3) 有參與PSI比沒參與PSI的人，其成績進步的可能性會上升0.4565。

probitmfx(Probit,data.set)

## Call:
## probitmfx(formula = Probit, data = data.set)
## 
## Marginal Effects:
##       dF/dx Std. Err.    z  P>|z|   
## GPA  0.5333    0.2267 2.35 0.0186 * 
## TUCE 0.0170    0.0263 0.64 0.5191   
## PSI1 0.4644    0.1713 2.71 0.0067 **
## ---
## Signif. codes:  
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## dF/dx is for discrete change for the following variables:
## 
## [1] "PSI1"

全體邊際效果平均

logitmfx(Logit,data.set,atmean=FALSE) #計算全體邊際效果的平均

## Call:
## logitmfx(formula = Logit, data = data.set, atmean = FALSE)
## 
## Marginal Effects:
##       dF/dx Std. Err.    z P>|z|  
## GPA  0.3626    0.2400 1.51 0.131  
## TUCE 0.0122    0.0190 0.64 0.521  
## PSI1 0.3575    0.1420 2.52 0.012 *
## ---
## Signif. codes:  
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## dF/dx is for discrete change for the following variables:
## 
## [1] "PSI1"

probitmfx(Probit,data.set,atmean=FALSE)

## Call:
## probitmfx(formula = Probit, data = data.set, atmean = FALSE)
## 
## Marginal Effects:
##       dF/dx Std. Err.    z  P>|z|   
## GPA  0.3608    0.1112 3.24 0.0012 **
## TUCE 0.0115    0.0178 0.65 0.5182   
## PSI1 0.3738    0.1420 2.63 0.0085 **
## ---
## Signif. codes:  
## 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## dF/dx is for discrete change for the following variables:
## 
## [1] "PSI1"