第 4 章 R for IV
4.1 setup
引入資料整理套件dplyr及
library(dplyr)
library(magrittr)
引入可做TSLS的套件:
library(AER)
引入AER裡附帶的CigarettesSW資料
data("CigarettesSW")
4.2 資料結構觀察
轉換資料成dplyr下的tibble格式,比較容易看資料狀態。
CigarettesSW %<>% as_tibble
CigarettesSW
## # A tibble: 96 x 9
## state year cpi population packs income tax
## <fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 AL 1985 1.08 3973000 116. 4.60e7 32.5
## 2 AR 1985 1.08 2327000 129. 2.62e7 37
## 3 AZ 1985 1.08 3184000 105. 4.40e7 31
## 4 CA 1985 1.08 26444000 100. 4.47e8 26
## 5 CO 1985 1.08 3209000 113. 4.95e7 31
## 6 CT 1985 1.08 3201000 109. 6.01e7 42
## 7 DE 1985 1.08 618000 144. 9.93e6 30
## 8 FL 1985 1.08 11352000 122. 1.67e8 37
## 9 GA 1985 1.08 5963000 127. 7.84e7 28
## 10 IA 1985 1.08 2830000 114. 3.79e7 34
## # … with 86 more rows, and 2 more variables:
## # price <dbl>, taxs <dbl>
只取1995資料,因為我們還沒學到追蹤資料的處理。
CigarettesSW %<>% filter(year=="1995")
4.3 產生新變數
CigarettesSW %<>%
mutate(
rprice=price/cpi,
rincome=income/population/cpi,
tdiff=(taxs-tax)/cpi
)
4.4 迴歸模型
\[\begin{align} log(packs) &= \beta_0+\beta_1 log(rprice)+\epsilon \tag{4.1}\\ log(packs) &= \beta_0+\beta_1 log(rprice)+\beta_2 log(rincome)+\epsilon \tag{4.2} \end{align}\]
接著以模型(4.2)為基礎,延伸兩個工具變數群:
\[\begin{align} \mbox{one IV} &: \mbox{tdiff} \tag{4.3}\\ \mbox{two IVs} &: \mbox{tdiff, tax/cpi} \tag{4.4} \end{align}\]
4.5 OLS估計
ols1<-lm(model1,CigarettesSW)
ols2<-lm(model2,CigarettesSW)
library(sandwich)
library(lmtest)
library(stargazer)
#使用vcovHC函數來計算HC1型的異質變異(即橫斷面資料下的線性迴歸模型)
coeftest(ols1, vcov. = vcovHC, type="HC1") -> ols1_coeftest
coeftest(ols2, vcov. = vcovHC, type="HC1") -> ols2_coeftest
OLS結果比較
參考資料:https://www.jakeruss.com/cheatsheets/stargazer/
library(stargazer)
stargazer(ols1,ols2,type="html")
Dependent variable: | ||
log(packs) | ||
(1) | (2) | |
log(rprice) | -1.213*** | -1.407*** |
(0.216) | (0.251) | |
log(rincome) | 0.344 | |
(0.235) | ||
Constant | 10.340*** | 10.340*** |
(1.035) | (1.023) | |
Observations | 48 | 48 |
R2 | 0.406 | 0.433 |
Adjusted R2 | 0.393 | 0.408 |
Residual Std. Error | 0.190 (df = 46) | 0.187 (df = 45) |
F Statistic | 31.410*** (df = 1; 46) | 17.160*** (df = 2; 45) |
Note: | p<0.1; p<0.05; p<0.01 |
4.6 TSLS估計
工具變數設定為(4.3):tsls_1iv
工具變數設定為(4.4):tsls_2iv
tsls_1iv <- ivreg(
log(packs) ~ log(rprice) + log(rincome) | log(rincome) + tdiff,
data=CigarettesSW
)
tsls_2iv <- ivreg(
log(packs) ~ log(rprice) + log(rincome) | log(rincome) + tdiff + I(tax/cpi),
data=CigarettesSW
)
在formula裡,每個term只能是一個變數,然而這裡tax/cpi牽扯到兩個變數,此時用I()
告訴formula把tax, cpi依tax/cpi處理完後才形成一個新的變數。
假設檢定
vcov=sandwich
:使用HC0異質變異設定
diagonostic=TRUE
: 同時進行TSLS的三項檢定
df=Inf
: 檢定p值使用z表而非t表
summary(tsls_1iv, vcov = sandwich, diagnostics = TRUE, df=Inf) -> tsls_1iv_tests
summary(tsls_2iv, vcov = sandwich, diagnostics = TRUE, df=Inf) -> tsls_2iv_tests
[Optional]原始sandwich(x,adjust=FALSE)
使用HC0的異質變異設定,若要改成HC1必需使adjust=TRUE
:
sandwich_HC1<-function(x,adjust=TRUE){
sandwich(x,adjust=adjust)
}
summary(tsls_2iv, vcov = sandwich_HC1, diagnostics = TRUE, df=Inf) -> tsls_2iv_tests_hc1
三個檢定的df是看df1, 非df2(在大樣本下,df2是無窮大)。
stargazer(tsls_1iv,tsls_2iv,type="html",
column.labels = c("one IV", "two IVs"),
add.lines = list(c("TSLS tests p-value", "",""),
c("Q1: Sargan","NA","0.5641"),
c("Q2: Weak instruments","1.4e-08","<2e-16"),
c("Q3: Wu-Hausman","0.263","0.0569")
))
Dependent variable: | ||
log(packs) | ||
one IV | two IVs | |
(1) | (2) | |
log(rprice) | -1.143*** | -1.277*** |
(0.359) | (0.263) | |
log(rincome) | 0.215 | 0.280 |
(0.269) | (0.239) | |
Constant | 9.431*** | 9.895*** |
(1.358) | (1.059) | |
TSLS tests p-value | ||
Q1: Sargan | NA | 0.5641 |
Q2: Weak instruments | 1.4e-08 | <2e-16 |
Q3: Wu-Hausman | 0.263 | 0.0569 |
Observations | 48 | 48 |
R2 | 0.419 | 0.429 |
Adjusted R2 | 0.393 | 0.404 |
Residual Std. Error (df = 45) | 0.190 | 0.188 |
Note: | p<0.1; p<0.05; p<0.01 |