Introduction to Econometrics with R

This book is in Open Review. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click the on the pop-up menu. You can also see the annotations of others: click the in the upper right hand corner of the page

12.4 Application to the Demand for Cigarettes

Are the general sales tax and the cigarette-specific tax valid instruments? If not, TSLS is not helpful to estimate the demand elasticity for cigarettes discussed in Chapter 12.2. As discussed in Chapter 12.1, both variables are likely to be relevant but whether they are exogenous is a different question.

The book argues that cigarette-specific taxes could be endogenous because there might be state specific historical factors like economic importance of the tobacco farming and cigarette production industry that lobby for low cigarette specific taxes. Since it is plausible that tobacco growing states have higher rates of smoking than others, this would lead to endogeneity of cigarette specific taxes. If we had data on the size on the tobacco and cigarette industry, we could solve this potential issue by including the information in the regression. Unfortunately, this is not the case.

However, since the role of the tobacco and cigarette industry is a factor that can be assumed to differ across states but not over time we may exploit the panel structure of CigarettesSW instead: as shown in Chapter 10.2, regression using data on changes between two time periods eliminates such state specific and time invariant effects. Following the book we consider changes in variables between 1985 and 1995. That is, we are interested in estimating the long-run elasticity of the demand for cigarettes.

The model to be estimated by TSLS using the general sales tax and the cigarette-specific sales tax as instruments hence is

\[\begin{align} \begin{split} \log(Q_{i,1995}^{cigarettes}) - \log(Q_{i,1995}^{cigarettes}) =& \, \beta_0 + \beta_1 \left[\log(P_{i,1995}^{cigarettes}) - \log(P_{i,1985}^{cigarettes}) \right] \\ &+ \beta_2 \left[\log(income_{i,1995}) - \log(income_{i,1985})\right] + u_i. \end{split}\tag{12.10} \end{align}\]

We first create differences from 1985 to 1995 for the dependent variable, the regressors and both instruments.

# subset data for year 1985
c1985 <- subset(CigarettesSW, year == "1985")

# define differences in variables
packsdiff <- log(c1995$packs) - log(c1985$packs)

pricediff <- log(c1995$price/c1995$cpi) - log(c1985$price/c1985$cpi)

incomediff <- log(c1995$income/c1995$population/c1995$cpi) -
log(c1985$income/c1985$population/c1985$cpi)

salestaxdiff <- (c1995$taxs - c1995$tax)/c1995$cpi - (c1985$taxs - c1985$tax)/c1985$cpi

cigtaxdiff <- c1995$tax/c1995$cpi - c1985$tax/c1985$cpi

We now perform three different IV estimations of (12.10) using ivreg():

TSLS using only the difference in the sales taxes between 1985 and 1995 as the instrument.
TSLS using only the difference in the cigarette-specific sales taxes 1985 and 1995 as the instrument.
TSLS using both the difference in the sales taxes 1985 and 1995 and the difference in the cigarette-specific sales taxes 1985 and 1995 as instruments.

# estimate the three models
cig_ivreg_diff1 <- ivreg(packsdiff ~ pricediff + incomediff | incomediff + 
                         salestaxdiff)

cig_ivreg_diff2 <- ivreg(packsdiff ~ pricediff + incomediff | incomediff + 
                         cigtaxdiff)

cig_ivreg_diff3 <- ivreg(packsdiff ~ pricediff + incomediff | incomediff + 
                         salestaxdiff + cigtaxdiff)

As usual we use coeftest() in conjunction with vcovHC() to obtain robust coefficient summaries for all models.

# robust coefficient summary for 1.
coeftest(cig_ivreg_diff1, vcov = vcovHC, type = "HC1")

## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) -0.117962   0.068217 -1.7292   0.09062 .  
## pricediff   -0.938014   0.207502 -4.5205 4.454e-05 ***
## incomediff   0.525970   0.339494  1.5493   0.12832    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# robust coefficient summary for 2.
coeftest(cig_ivreg_diff2, vcov = vcovHC, type = "HC1")

## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) -0.017049   0.067217 -0.2536    0.8009    
## pricediff   -1.342515   0.228661 -5.8712 4.848e-07 ***
## incomediff   0.428146   0.298718  1.4333    0.1587    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# robust coefficient summary for 3.
coeftest(cig_ivreg_diff3, vcov = vcovHC, type = "HC1")

## 
## t test of coefficients:
## 
##              Estimate Std. Error t value  Pr(>|t|)    
## (Intercept) -0.052003   0.062488 -0.8322    0.4097    
## pricediff   -1.202403   0.196943 -6.1053 2.178e-07 ***
## incomediff   0.462030   0.309341  1.4936    0.1423    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We proceed by generating a tabulated summary of the estimation results using stargazer().

# gather robust standard errors in a list
rob_se <- list(sqrt(diag(vcovHC(cig_ivreg_diff1, type = "HC1"))),
               sqrt(diag(vcovHC(cig_ivreg_diff2, type = "HC1"))),
               sqrt(diag(vcovHC(cig_ivreg_diff3, type = "HC1"))))

# generate table
stargazer(cig_ivreg_diff1, cig_ivreg_diff2,cig_ivreg_diff3,
  header = FALSE, 
  type = "html",
  omit.table.layout = "n",
  digits = 3, 
  column.labels = c("IV: salestax", "IV: cigtax", "IVs: salestax, cigtax"),
  dep.var.labels.include = FALSE,
  dep.var.caption = "Dependent Variable: 1985-1995 Difference in Log per Pack Price",
  se = rob_se)


	Dependent variable: 1985-1995 difference in log per pack price

	IV: salestax	IV: cigtax	IVs: salestax, cigtax
	(1)	(2)	(3)

pricediff	-0.938^***	-1.343^***	-1.202^***
	(0.208)	(0.229)	(0.197)

incomediff	0.526	0.428	0.462
	(0.339)	(0.299)	(0.309)

Constant	-0.118^*	-0.017	-0.052
	(0.068)	(0.067)	(0.062)


Observations	48	48	48
R²	0.550	0.520	0.547
Adjusted R²	0.530	0.498	0.526
Residual Std. Error (df = 45)	0.091	0.094	0.091

Note:	^p<0.1; ^p<0.05; ^**p<0.01

Table 12.1: TSLS Estimates of the Long-Term Elasticity of the Demand for Cigarettes using Panel Data

Table 12.1 reports negative estimates of the coefficient on pricediff that are quite different in magnitude. Which one should we trust? This hinges on the validity of the instruments used. To assess this we compute \(F\)-statistics for the first-stage regressions of all three models to check instrument relevance.

# first-stage regressions
mod_relevance1 <- lm(pricediff ~ salestaxdiff + incomediff)
mod_relevance2 <- lm(pricediff ~ cigtaxdiff + incomediff)
mod_relevance3 <- lm(pricediff ~ incomediff + salestaxdiff + cigtaxdiff)

# check instrument relevance for model (1)
linearHypothesis(mod_relevance1, 
                 "salestaxdiff = 0", 
                 vcov = vcovHC, type = "HC1")

## Linear hypothesis test
## 
## Hypothesis:
## salestaxdiff = 0
## 
## Model 1: restricted model
## Model 2: pricediff ~ salestaxdiff + incomediff
## 
## Note: Coefficient covariance matrix supplied.
## 
##   Res.Df Df      F    Pr(>F)    
## 1     46                        
## 2     45  1 28.445 3.009e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# check instrument relevance for model (2)
linearHypothesis(mod_relevance2, 
                 "cigtaxdiff = 0", 
                 vcov = vcovHC, type = "HC1")

## Linear hypothesis test
## 
## Hypothesis:
## cigtaxdiff = 0
## 
## Model 1: restricted model
## Model 2: pricediff ~ cigtaxdiff + incomediff
## 
## Note: Coefficient covariance matrix supplied.
## 
##   Res.Df Df      F   Pr(>F)    
## 1     46                       
## 2     45  1 98.034 7.09e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# check instrument relevance for model (3)
linearHypothesis(mod_relevance3, 
                 c("salestaxdiff = 0", "cigtaxdiff = 0"), 
                 vcov = vcovHC, type = "HC1")

## Linear hypothesis test
## 
## Hypothesis:
## salestaxdiff = 0
## cigtaxdiff = 0
## 
## Model 1: restricted model
## Model 2: pricediff ~ incomediff + salestaxdiff + cigtaxdiff
## 
## Note: Coefficient covariance matrix supplied.
## 
##   Res.Df Df      F    Pr(>F)    
## 1     46                        
## 2     44  2 76.916 4.339e-15 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

We also conduct the overidentifying restrictions test for model three which is the only model where the coefficient on the difference in log prices is overidentified (\(m=2\), \(k=1\)) such that the \(J\)-statistic can be computed. To do this we take the residuals stored in cig_ivreg_diff3 and regress them on both instruments and the presumably exogenous regressor incomediff. We again use linearHypothesis() to test whether the coefficients on both instruments are zero which is necessary for the exogeneity assumption to be fulfilled. Note that with test = “Chisq” we obtain a chi-squared distributed test statistic instead of an \(F\)-statistic.

# compute the J-statistic
cig_iv_OR <- lm(residuals(cig_ivreg_diff3) ~ incomediff + salestaxdiff + cigtaxdiff)

cig_OR_test <- linearHypothesis(cig_iv_OR, 
                               c("salestaxdiff = 0", "cigtaxdiff = 0"), 
                               test = "Chisq")
cig_OR_test

## Linear hypothesis test
## 
## Hypothesis:
## salestaxdiff = 0
## cigtaxdiff = 0
## 
## Model 1: restricted model
## Model 2: residuals(cig_ivreg_diff3) ~ incomediff + salestaxdiff + cigtaxdiff
## 
##   Res.Df     RSS Df Sum of Sq Chisq Pr(>Chisq)  
## 1     46 0.37472                                
## 2     44 0.33695  2  0.037769 4.932    0.08492 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Caution: In this case the \(p\)-Value reported by linearHypothesis() is wrong because the degrees of freedom are set to \(2\). This differs from the degree of overidentification (\(m-k=2-1=1\)) so the \(J\)-statistic is \(\chi^2_1\) distributed instead of following a \(\chi^2_2\) distribution as assumed defaultly by linearHypothesis(). We may compute the correct \(p\)-Value using pchisq().

# compute correct p-value for J-statistic
pchisq(cig_OR_test[2, 5], df = 1, lower.tail = FALSE)

## [1] 0.02636406

Since this value is smaller than \(0.05\) we reject the hypothesis that both instruments are exogenous at the level of \(5\%\). This means one of the following:

The sales tax is an invalid instrument for the per-pack price.
The cigarettes-specific sales tax is an invalid instrument for the per-pack price.
Both instruments are invalid.

The book argues that the assumption of instrument exogeneity is more likely to hold for the general sales tax (see Chapter 12.4 of the book) such that the IV estimate of the long-run elasticity of demand for cigarettes we consider the most trustworthy is \(-0.94\), the TSLS estimate obtained using the general sales tax as the only instrument.

The interpretation of this estimate is that over a 10-year period, an increase in the average price per package by one percent is expected to decrease consumption by about \(0.94\) percentage points. This suggests that, in the long run, price increases can reduce cigarette consumption considerably.