This tutorial reviews some concepts for the basic linear model, using the econometrics software package R. Specifically, the tutorial reviews:

estimation and interpretation in econometric models using the method of Instrumental Variables (IV) in R
estimation and interpretation in econometric models using the method of Two-Stage Least Squares (2SLS) in R

This tutorial requires one (1) data file:

fultonfish.csv

This file can be obtained from the Canvas subject page.

In addition, the file, tut11.R , provides the program code (R script file) necessary to complete this titprial. The R script file uses the following packages which need to be installed prior to running this program:

stargazer :	for easily generating summary statistics for an R data file
ggplot2 :	for eaily porducing grpahs in R
car:	for easily conducting hypothesis tests in R
lmtest :	for easily conducting the Ramsey RESET test in R
sandwich:	for easily calculating robust standard errors in R
rio:	for easily importing data into R
sandwich:	for easily calculating robust (Huber-White) heteroskedasticty consistent standard errors in R
AER:	for easily estimating models by the method of Instrumental Variables (IV) in R

These can be installed directly in RStudio from the packages tab or by using the command install.packages() and inserting the name of the package in the brackets.

Question 1

[Hill,Griffiths,Lim 5th Ed., pp 542-544]

This question reviews example 11.2 in the textbook.

Hint: In all regressions, use the White (Huber-White) standard errors.

Consider the following daily demand function for whiting (fish) at the Fulton fish market:

$ln Q_{t} = α_{0} + α_{1} ln P_{t} + α_{2} {MON}_{t} + α_{3} {TUES}_{t} + α_{4} {WED}_{t} + α_{5} {THU}_{t} + ε_{d t}$

where $Q_{t}$ represents the quantity sold (in pounds), $P_{t}$ represents the average daily price per pound.
The remaining variables are indicator variables for the days of the workweek, with Friday as the omitted category. The supply equation is:

$\ln Q_{t} = β_{0} + β_{1} ln P_{t} + β_{2} {STORMY}_{t} + ε_{s t}$

The variable $STORMY$ is an indicator variable denoting stormy weather in the previous three days. We expect $β_{2}$ to be negative - stormy weather reduces the supply of fish brought to the market.

The data file fultonfish.csv contains daily observations on the price of whiting, the quantity sold, and weather conditions, from December 2 1991 until May 8, 1992. ::: column-margin

(a)

Does the demand equation equation satisfy the necessary condition for identification. Why or why not?

Solution

There are $M = 2$ endogenous variables ( $Q_{t}$ and $P_{t}$ ) so the necessary condition for identification requires at least $(M - 1) = 1$ variables be excluded from the demand function. The variable stromy is omitted from the demand function so it satisfies this condition for identification.
Stormy conditions shift the supply function, relative to a fixed demand function (since it does not contain the stormy variable), tracing out the demand curve.

(b)

Does the supply equation equation satisfy the necessary condition for identification. Why or why not?

Solution

There are $M = 2$ endogenous variables ( $Q_{t}$ and $P_{t}$ ) so the necessary condition for identification requires at least $(M - 1) = 1$ variables be excluded from the supply function.
The day of the week indicator variables are omitted from the supply function so the necessary condition for identification is satisfied.
The demand function shifts daily around a fixed supply curve (since it does not contain the day of the week controls), tracing out the supply curve.

(c)

The reduced-form equations for the demand-supply system are given by:

$\begin{aligned} \ln Q_{t} & = π_{Q 0} + π_{Q 1} {MON}_{t} + π_{Q 2} {TUES}_{t} + π_{Q 3} {WED}_{t} + π_{Q 4} {THU}_{t} \\ + π_{Q 5} {STORMY}_{t} + υ_{Q t} \\ \ln P_{t} & = π_{P 0} + π_{P 1} {MON}_{t} + π_{P 2} {TUES}_{t} + π_{P 3} {WED}_{t} + π_{P 4} {THU}_{t} \\ + π_{P 5} {STORMY}_{t} + υ_{P t} \end{aligned}$

Estimate these reduced forms by OLS.

Run the following :R code given in tut11.R (click on the Code button to see and/or copy and paste the R code chunk):

Code

#------------------------------------------
# Reduced Form for lnq, with robust standard errors
#---------------------------------------
# read data file into R
fish <- import("fultonfish.csv")
# endogenous variables : lnprice, lnquant
# exogenous variables: days of the week: {mon, tue, wed, thu}
# exogenous variable: stormy         

reduced_lnq <- lm(lnquan ~ mon + tue + wed + thu +  stormy, data = fish)
# print(summary(reduced_lnq))
demdf10 <-df.residual(reduced_lnq)
# Robust standard errors
# Adjust standard errors using sandwich package
cov5 <- vcovHC(reduced_lnq, type = "HC1")
reduced_lnq_se_r <-coeftest(reduced_lnq, vcov=cov5)
# print(reduced_lnq_se_r)
reduced_lnq_robust    <- sqrt(diag(cov5))                      # OLS estimates with robust standard errors
# Adjust F statistic 
wald_reduced_lnq_r <- waldtest(reduced_lnq, vcov = cov5)       # Sample F test statistic using cov5 varcov matrix
# print(wald_reduced_lnq_r)
fstat10 <- round(wald_reduced_lnq_r$"F"[2], digits=4)          # Sample value of F stat
pvalf10 <- round(wald_reduced_lnq_r$'Pr(>F)'[2], digits=4)     # p value of F test
numdf10 <- abs(wald_reduced_lnq_r$"Df"[2])    
#-----------------------------------------------
# Reduced Form for lnp, with robust standard errors
#--------------------------------------------
reduced_lnp <- lm(lnprice ~ mon + tue + wed + thu + stormy, data = fish)
# print(summary(reduced_lnp))
demdf12 <-df.residual(reduced_lnp)
# Robust standard errors
# Adjust standard errors using sandwich package
cov6         <- vcovHC(reduced_lnp, type = "HC1")
reduced_lnp_se_r <-coeftest(reduced_lnp, vcov=cov6)
# print(reduced_lnp_se_r)
reduced_lnp_robust    <- sqrt(diag(cov6))                      # OLS estimates with robust standard errors
# Adjust F statistic 
wald_reduced_lnp_r <- waldtest(reduced_lnp, vcov = cov6)      # Sample F test statistic using cov6 varcov matrix
# print(wald_reduced_lnp_r)
fstat12 <- round(wald_reduced_lnp_r$"F"[2], digits=4)          # Sample value of F stat
pvalf12 <- round(wald_reduced_lnp_r$'Pr(>F)'[2], digits=4)     # p value of F test
numdf12 <- abs(wald_reduced_lnp_r$"Df"[2])    
#------------------------------------

The reduced form supply and demand equations are reported below:

Code

stargazer(reduced_lnq,reduced_lnp, type = "html", dep.var.labels=c("(Log) Quantity", "(Log) Price"),
          covariate.labels=c("Intercept", "Monday", "Tuesday", "Wednesday", "Thursday", "Stormy"), 
          column.labels = c("(Robust)", "(Robust)"),
          se        = list(reduced_lnq_robust, reduced_lnp_robust),
          omit.stat = "f",
          add.lines = list(c("F Statistic", fstat10, fstat12),
                           c("F p value", pvalf10, pvalf12),
                           c("F num df", numdf10, numdf12),
                           c("F dem df", demdf10, demdf12)),
                   digits=4, align=TRUE,
          intercept.bottom=FALSE,
                 star.cutoffs = c(0.05, 0.01, 0.001))


	Dependent variable:

	(Log) Quantity	(Log) Price
	(Robust)	(Robust)
	(1)	(2)

Intercept	8.8101^***	-0.2717^**
	(0.1174)	(0.0952)

Monday	0.1010	-0.1129
	(0.1978)	(0.1154)

Tuesday	-0.4847^*	-0.0411
	(0.1939)	(0.1164)

Wednesday	-0.5531^**	-0.0118
	(0.1986)	(0.1135)

Thursday	0.0537	0.0496
	(0.1720)	(0.1156)

Stormy	-0.3878^**	0.3464^***
	(0.1426)	(0.0723)


F Statistic	5.9278	5.8996
F p value	0.0001	0.0001
F num df	5	5
F dem df	105	105
Observations	111	111
R²	0.1934	0.1789
Adjusted R²	0.1550	0.1398
Residual Std. Error (df = 105)	0.6818	0.3542

Note:	p<0.05; p<0.01; p<0.001

Figure 1: Reduced Forms for

ln Q_{t}

and

ln P_{t}

i)

Test the hypothesis that necessary conditions for identification of the supply function are satisfied.

Solution

Aside: Recall that for the 2SLS estimator we replace the endogenous variable $\ln P_{t}$ with the predicted values from the reduced form for $\hat{\ln P_{t}}$ .
Suppose that the day of the week indicator variables were jointly zero so ${\hat{π}}_{P 1} = {\hat{π}}_{P 2} = {\hat{π}}_{P 3} = {\hat{π}}_{P 4} = 0$ . The predicted values become:

$\hat{\ln P_{t}} = {\hat{π}}_{P 0} + {\hat{π}}_{P 5}, STORMY_t$ But if we replace $\ln P_{t}$ with $\hat{\ln P_{t}}$ in the supply equation there will be exact collinearity. The estimated supply equation would be: $\begin{aligned} \ln Q_{t} & = β_{0} + β_{1} {\hat{\ln P}}_{t} + β_{2} {STORMY}_{t} + ε_{s t} \\ = β_{0} + β_{1} {{\hat{π}}_{P 0} + {\hat{π}}_{P 5} {STORMY}_{t}} + β_{2} {STORMY}_{t} + ε_{s t} \\ = {β_{0} + β_{1} {\hat{π}}_{P 0}} + {β_{1} {\hat{π}}_{P 5} + β_{2}} {STORMY}_{t} + ε_{s t} \\ = γ_{0} + γ_{1} {STORMY}_{t} + ε_{s t} \end{aligned}$

where $γ_{0} = (β_{0} + β_{1} {\hat{π}}_{P 0})$ and $γ_{1} = (β_{1} {\hat{π}}_{P 5} + β_{2})$ .

So only estimates of $γ_{0}$ and $γ_{1}$ are identified. We are unable to obtain separate estimates of the structural parameters $β_{1}$ and $β_{2}$ .

Now if the coefficients on the day of the week variables (that is, $π_{P 1}$ , $π_{P 2}$ , $π_{P 3}$ , and $π_{P 4}$ in the reduced form for $\ln P_{t}$ ) are not identically zero but jointly insignificant, there will be a situation of almost exact collinearity. The estimated 2SLS coefficients ${\hat{b}}_{1}$ and ${\hat{b}}_{2}$ (in the structural supply function) will be imprecisely estimated and only weakly identified.

For the reduced form for $\ln P_{t}$ consider the null hypothesis:

$H_{0} : π_{P 1} = π_{P 2} = π_{P 3} = π_{P 4} = 0$

against the alternative $H_{A}$ that at least one of these parameters is non-zero. The test statistic will follow a F-distribution with (M,N-K-1) degrees of freedom.

Here the number of restrictions $M = 4$ and the model degrees of freedom is $(111 - 6) = 105$ .
The F critical value $F_{c} \approx 2.68$ .
The decision rule - reject $H_{0}$ if the sample value of F test statistic exceeds the $F_{c}$ critical value.
Alternatively, reject $H_{0}$ if the p-value for the sample value of the test statistic is less than $α = 0.05$ .

Code

#------------------------------------
# Is Supply Equation Identified?
# Test Days of Week in reduced form for lnp
hnull_1 <- c("mon=0", "tue = 0", "wed=0", "thu=0")
linearHypothesis(reduced_lnp, hnull_1, vcov=cov6)


Linear hypothesis test:
mon = 0
tue = 0
wed = 0
thu = 0

Model 1: restricted model
Model 2: lnprice ~ mon + tue + wed + thu + stormy

Note: Coefficient covariance matrix supplied.

  Res.Df Df      F Pr(>F)
1    109                 
2    105  4 0.7426 0.5651

Figure 2: F test dfor significance of Day of Week Indicators in Reduced Form:

In P_{t}

The output provides a sample F-statistic of $F = 0.7426$ with a p-value of $0.5651$ .

Since the p-value is larger than the desired level of significance we do not reject the null hypothesis.
Even if we could reject $H_{0}$ our rule of thumb requires a value for the F test statistic of at least 10 in order to avoid the weak identification problem.
In practice, the supply equation is not identified in this example.

ii)

Test the hypothesis that necessary conditions for identification of the demand function are satisfied.

Solution

Aside: Recall that for the 2SLS estimator we replace the endogenous variable $\ln P_{t}$ with the predicted values from the reduced form for $\hat{\ln P_{t}}$ . Suppose that the stormy weather variable was zero so ${\hat{π}}_{P 5} = 0$ .
The predicted values become:

$\hat{\ln P_{t}} = π_{P 0} + {\hat{π}}_{P 1}, {MON}_{t} + π {\hat{π}}_{P 2} {TUES}_{t} + {\hat{π}}_{P 3} {WED}_{t} + {\hat{π}}_{P 4} {THU}_{t}$

But if we replace $\ln P_{t}$ with $\hat{\ln P_{t}}$ in the demand equation there will be exact collinearity. The estimated demand equation would be:

$\begin{aligned} \ln Q_{t} & = α_{0} + α_{1} {\hat{\ln P}}_{t} + α_{2} {MON}_{t} + α_{3} {TUES}_{t} + α_{4} {WED}_{t} + α_{5} {THU}_{t} + ε_{d t} \\ = α_{0} + α_{1} {{\hat{π}}_{P 0} + {\hat{π}}_{P 1} {MON}_{t} + {\hat{π}}_{P 2} {TUES}_{t} + {\hat{π}}_{P 3} {WED}_{t} + {\hat{π}}_{P 4} {THU}_{t}} \\ + α_{2} {MON}_{t} + α_{3} {TUES}_{t} + α_{4} {WED}_{t} + α_{5} {THU}_{t} + ε_{d t} \\ = {α_{0} + α_{1} {\hat{π}}_{P 0}} + {α_{2} + α_{1} {\hat{π}}_{P 1}} {MON}_{t} + {α_{3} + α_{1} {\hat{π}}_{P 2}} {TUES}_{t} \\ + {α_{4} + α_{1} {\hat{π}}_{P 3}} {WED}_{t} + {α_{5} + α_{1} {\hat{π}}_{P 4}} {THURS}_{t} + ε_{d t} \\ = γ_{0} + γ_{1} {MON}_{t} + γ_{2} {TUES}_{t} + γ_{3} {WED}_{t} + γ_{4} {THU}_{t} + ε_{d t} \end{aligned}$

where:

$\begin{aligned} γ_{0} & = α_{0} + α_{1} {\hat{π}}_{P 0} \\ γ_{1} & = α_{2} + α_{1} {\hat{π}}_{P 1} \\ γ_{2} & = α_{3} + α_{1} {\hat{π}}_{P 2} \\ γ_{3} & = α_{4} + α_{1} {\hat{π}}_{P 3} \\ γ_{4} & = α_{5} + α_{1} {\hat{π}}_{P 4} \end{aligned}$

So only estimates of $γ_{0}$ , $γ_{1}$ , $γ_{2}$ , $γ_{3}$ , and $γ_{4}$ are identified. We are unable to obtain separate estimate of the structural parameter of interest $α_{1}$ .

Now if the coefficient on the stormy variable (that is, $π_{P 5}$ in the reduced form for $\ln P_{t}$ ) is not identically zero but statistically insignificant, there will be a situation of almost exact collinearity.
The estimated 2SLS coefficient of interest ${\hat{a}}_{1}$ (in the structural demand equation) will be imprecisely estimated and only weakly identified.

The requirement for identification is that the coefficient on the variable stormy be statistically significant in the reduced form for $\ln P_{t}$ . In this case, either a t-test or a F-test is sufficient. For the reduced form for $\ln P_{t}$ consider the null hypothesis:

$H_{0} : π_{P 5} = 0$

against the alternative $H_{A}$ that at least one of these parameters is non-zero.

The test statistic will follow a F-distribution with (M,N-K-1) degrees of freedom. Here the number of restrictions $M = 1$ and the model degrees of freedom is $(111 - 6) = 105$ .
The F critical value $F_{c} \approx 3.92$ .
The decision rule - reject $H_{0}$ if the sample value of the F test statistic exceeds the $F_{c}$ critical value. Alternatively, reject $H_{0}$ if the p-value for the sample value of the test statistic is less than $α = 0.05$ .

Code

#------------------------------------
# Is Demand Equation Identified?
# Test stormy in reduced form for lnp
# can do simple t-test or F test
# print(summary(reduced_lnp))
hnull_2 <- c("stormy=0")
linearHypothesis(reduced_lnp, hnull_2, vcov=cov6)


Linear hypothesis test:
stormy = 0

Model 1: restricted model
Model 2: lnprice ~ mon + tue + wed + thu + stormy

Note: Coefficient covariance matrix supplied.

  Res.Df Df      F      Pr(>F)    
1    106                          
2    105  1 22.929 0.000005537 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Figure 3: F Test for Significance of Stormy Weather in Reduced Form:

ln P_{t}

The R output from the Wald test of:

$H_{0} : π_{p 5} = 0$ is given in Figure3.

The output provides a sample F-statistic of $F = 22.929$ with a p-value of $0.0000$ . Since the p-value is less than the desired level of significance we reject the null hypothesis.
The variable is statistically significant in the reduced form for $\ln P_{t}$ and the demand function is `practically’ identified.

Alternatively consider a two-sided t-test of the null hypothesis $H_{0} : π_{P 5} = 0$ .
The test statistic will follow a t-distribution with (N-K-1) degrees of freedom. Here the degrees of freedom are $(111 - 6) = 105$ .
The t critical value $t_{c} \approx 1.96$ .
The decision rule - reject $H_{0}$ if $t > t_{c}$ or $t < - t_{c}$ .
Alternatively, reject $H_{0}$ if the p-value for the sample value of the test statistic is less than $α = 0.05$ .

The sample value of the test statistic (using the robust standard error) is calculated as:

$t = \frac{{\hat{π}}_{5} - 0}{se ({\hat{π}}_{5})} = \frac{0.346406}{0.072343} = 4.7884$

Since $t > t_{c}$ we reject the null hypothesis. Based on the sample value of the test statistic, we would also reject the null hypothesis $H_{0} : α_{1} \leq - 1$ against the alternative $H_{A} : α_{1} > - 1$ .
Based on the OLS results, the sample evidence is consistent with an estimated elasticity of demand that is inelastic (an absolute value between 0 and 1).

d)

Estimate the demand equation by 2SLS. Comment on the magnitude of the estimated price elasticity.

Solution

Run the following code

Code

demand_ols <- lm(lnquan ~ lnprice + mon + tue + wed+ thu, data=fish)
# print(summary(demand_ols))
demdf8 <-df.residual(demand_ols)
# Robust standard errors
# Adjust standard errors using sandwich package
cov4         <- vcovHC(demand_ols, type = "HC1")
demand_ols_se_r <-coeftest(demand_ols, vcov=cov4)
# print(demand_ols_se_r)
demand_ols_robust    <- sqrt(diag(cov4))                            # OLS estimates with robust standard errors
# Adjust F statistic 
wald_demand_ols_r <- waldtest(demand_ols, vcov = cov4, test="F")    # Sample F test statistic using cov4 varcov matrix
# print(wald_demand_ols_r)
fstat8 <- round(wald_demand_ols_r$"F"[2], digits=4)          # Sample value of F stat
pvalf8 <- round(wald_demand_ols_r$'Pr(>F)'[2], digits=4)     # p value of F test
numdf8 <- abs(wald_demand_ols_r$"Df"[2])   
demand_iv <- ivreg(lnquan ~ lnprice + mon + tue + wed + thu |
                     mon + tue + wed + thu  + stormy, data=fish)
# print(summary(demand_iv))
demdf4 <-df.residual(demand_ols)
# Robust standard errors
# Adjust standard errors using sandwich package
cov4 <- vcovHC(demand_iv, type = "HC1")
demand_iv_se_r <-coeftest(demand_iv, vcov=cov4)
# print(demand_iv_se_r)
demand_iv_robust    <- sqrt(diag(cov4))                                  # IV estimates with robust standard errors
# Adjust F statistic 
wald_demand_iv_r <- waldtest(demand_iv, vcov = cov4, test="F")           # Sample F test statistic using cov7 varcov matrix
# print(wald_demand_iv_r)
fstat4 <- round(wald_demand_iv_r$"F"[2], digits=4)                      # Sample value of F stat
pvalf4 <- round(wald_demand_iv_r$'Pr(>F)'[2], digits=4)                 # p value of F test
numdf4 <- abs(wald_demand_iv_r$"Df"[2])
#####################
# 2SLS for Demand Equation, with robust standard errors
# lnp is endogenous
# Instrument Set: mon tue wed thu stormy
#--------------------------------------
demand_iv <- ivreg(lnquan ~ lnprice + mon + tue + wed + thu |
                     mon + tue + wed + thu  + stormy, data=fish)
# print(summary(demand_iv))
demdf14 <-df.residual(demand_ols)
# Robust standard errors
# Adjust standard errors using sandwich package
cov7 <- vcovHC(demand_iv, type = "HC1")
demand_iv_se_r <-coeftest(demand_iv, vcov=cov7)
# print(demand_iv_se_r)
demand_iv_robust    <- sqrt(diag(cov7))                                  # IV estimates with robust standard errors
# Adjust F statistic 
wald_demand_iv_r <- waldtest(demand_iv, vcov = cov7, test="F")           # Sample F test statistic using cov7 varcov matrix
# print(wald_demand_iv_r)
fstat14 <- round(wald_demand_iv_r$"F"[2], digits=4)                      # Sample value of F stat
pvalf14 <- round(wald_demand_iv_r$'Pr(>F)'[2], digits=4)                 # p value of F test
numdf14 <- abs(wald_demand_iv_r$"Df"[2])

Code

stargazer(demand_ols,demand_iv, type = "html", dep.var.labels=c("(Log) Quantity"),
          covariate.labels=c("Intercept", "(Log) Price", "Monday",
                             "Tuesday", "Wednesday", "Thursday"), 
          column.labels = c("(Robust)", "(Robust)"),
          se        = list(demand_ols_robust, demand_iv_robust),
          omit.stat = "f",
          add.lines = list(c("F Statistic", fstat8, fstat14),
                           c("F p value", pvalf8, pvalf14),
                           c("F num df", numdf8, numdf14),
                           c("F dem df", demdf8, demdf14)),
          digits=4, align=TRUE,
          title = "Figure 4: Two Stage Least Squares Results for the Demand Function",
          intercept.bottom=FALSE,
          star.cutoffs = c(0.05, 0.01, 0.001))

**Figure 4: Two Stage Least Squares Results for the Demand Function**

	Dependent variable:

	(Log) Quantity
	OLS	instrumental
		variable
	(Robust)	(Robust)
	(1)	(2)

Intercept	8.6069^***	8.5059^***
	(0.1183)	(0.1521)

(Log) Price	-0.5625^***	-1.1194^*
	(0.1522)	(0.4432)

Monday	0.0143	-0.0254
	(0.2057)	(0.2214)

Tuesday	-0.5162^**	-0.5308^**
	(0.1897)	(0.2021)

Wednesday	-0.5554^**	-0.5664^**
	(0.1937)	(0.2069)

Thursday	0.0816	0.1093
	(0.1620)	(0.1784)


F Statistic	9.4007	4.7195
F p value	0	0.0006
F num df	5	5
F dem df	105	105
Observations	111	111
R²	0.2205	0.1391
Adjusted R²	0.1834	0.0981
Residual Std. Error (df = 105)	0.6702	0.7043

Note:	p<0.05; p<0.01; p<0.001

The OLS estimation results (with robust standard errors) provide an estimate of the price elasticity of demand of -0.5625 so a 1% increase in price is associated with a 0.56% decrease in quantity demanded. However, in the presence of simultaneity bias, the OLS estimator will be both biased and inconsistent. Consider the null hypothesis $H_{0} : α_{1} = - 1$ against the alternative $H_{A} : α_{1} \neq - 1$ . The test statistic will follow a t-distribution with (N-K-1) degrees of freedom. Here the degrees of freedom are $(111 - 6) = 105$ . The t critical value $t_{c} \approx 1.96$ . The decision rule—reject $H_{0}$ if $t > t_{c}$ or $t < - t_{c}$ . Alternatively, reject $H_{0}$ if the p-value for the sample value of the test statistic is less than $α = 0.05$ .

The sample value of the test statistic (using the robust standard error) is calculated as: $t = \frac{a_{1} + 1}{se (a_{1})} = \frac{- 0.5625 + 1}{0.1522} = 2.8747$ Since $t > t_{c}$ we reject the null hypothesis. Based on the sample value of the test statistic, we would also reject the null hypothesis $H_{0} : α_{1} \leq - 1$ against the alternative $H_{A} : α_{1} > - 1$ . Based on the OLS results, the sample evidence is consistent with an estimated elasticity of demand that is inelastic (an absolute value between 0 and 1).

The 2SLS estimation results are reported in Figure 4. The estimated coefficient on $\ln P$ represents an elasticity so a 1% increase in price is associated with a 1.12% decrease in quantity demanded.
The p-value for a two-sided t-test about zero is 0.0130 so the estimate is statistically significant at the 5% level of significance. Consider the null hypothesis $H_{0} : α_{1} = - 1$ against the alternative $H_{A} : α_{1} \neq - 1$ . The test statistic will follow a t-distribution with (N-K-1) degrees of freedom. Here the degrees of freedom are $(111 - 6) = 105$ . The t critical value $t_{c} \approx 1.96$ . The decision rule - reject $H_{0}$ if $t > t_{c}$ or $t < - t_{c}$ . Alternatively, reject $H_{0}$ if the p-value for the sample value of the test statistic is less than $α = 0.05$ .

The sample value of the test statistic (using the robust standard error) is calculated as: $t = \frac{a_{1} + 1}{se (a_{1})} = \frac{- 1.1194 + 1}{0.4432} = - 0.2694$ Since $- t_{c} < t < t_{c}$ we do not reject the null hypothesis. Based on the IV results, the sample evidence is consistent with an estimated elasticity of demand that is unitary elastic (an absolute value of exactly 1). Based on the sample value of the test statistic, we would also not reject the null hypothesis $H_{0} : α_{1} \leq - 1$ against the alternative $H_{A} : α_{1} > - 1$ .

Since the OLS estimator of the price elasticity of demand ( $a_{1}$ ) will generally be upward biased (less negative) as a result of simultaneity bias, we would make different conclusions regarding the elasticity of demand whether we ignore this simultaneity bias (OLS) or use the 2SLS estimator.

Question 2

The labour market outcomes for partnered women are of great interest to governments, economists, financial planners, and many other interested players in the economy. Consider the following labour demand equation for partnered women: ${lnwage}_{i} = α_{0} + α_{1} {hours}_{i} + α_{2} {educ}_{i} + α_{3} {exper}_{i} + α_{4} {exper}_{i}^{2} + α_{5} {union}_{i} + ε_{w i}$ where:
$\begin{aligned} {lnwage}_{i} & = (log) hourly wage of individual i \\ {hours}_{i} & = weekly hours of work of individual i \\ {educ}_{i} & = completed educational attainment of individual i \\ {exper}_{i} & = years of labour market experience of individual i \\ {union}_{i} & = 1 if individual i is a member of a union, 0 otherwise \end{aligned}$ Consider the following labour supply function for partnered women:
${hours}_{i} = β_{0} + β_{1} {lnwage}_{i} + β_{2} {educ}_{i} + β_{3} {childlt6}_{i} + β_{4} {childge6}_{i} + β_{5} {nlinc}_{i} + ε_{h i}$ where:
$\begin{aligned} {childlt6}_{i} = & 1 if the youngest child in the household of individual i is < 6, 0 otherwise \\ {childge6}_{i} = & 1 if the youngest child in the household of individual i is \geq 6, 0 otherwise \\ {nlinc}_{i} = & household income from all sources, excluding the employment income \\ of individual i, in thousands of dollars \end{aligned}$ The reduced form equations for this system are given by:
$\begin{aligned} {lnwage}_{i} & = π_{w 0} + π_{w 1} {educ}_{i} + π_{w 2} {exper}_{i} + π_{w 3} {exper}_{i}^{2} + π_{w 4} {union}_{i} \\ + π_{w 5} {childlt6}_{i} + π_{w 6} {childge6}_{i} + π_{w 7} {nlinc}_{i} + υ_{w i} \\ {hours}_{i} & = π_{h 0} + π_{h 1} {educ}_{i} + π_{h 2} {exper}_{i} + π_{h 3} {exper}_{i}^{2} + π_{h 4} {union}_{i} \\ + π_{h 5} {childlt6}_{i} + π_{h 6} {childge6}_{i} + π_{h 7} {nlinc}_{i} + υ_{h i} \end{aligned}$ You have available a sample containing $2, 867$ observations that includes data on all the variables defined above.

a)

Consider the labour supply function (2). Do you thins the condition $COV [lnwage, e d u c, childlt6, childge6, nlinc] = 0$ is likely to be satisfied?
Clearly explain why or why not. Outline three possible reasons why this condition might not be satisfied. Explain the consequences for the OLS estimator if this condition is not satisfied.

Solution

The condition might not be satisfied as a result of:

Measurement Error: Measurement error in wages will induce a correlation between (observed) wages and the unobservable determinants of hours. Our prior is that the labour supply curve is upward sloping (when the substitution effect dominates the income effect) so $β_{1} > 0$ . In the case of classical measurement error, the OLS estimates of $β_{1}$ will be downward biased towards zero.
Omitted Variable Bias: There are likely (omitted) variables that are both correlated with wages and the unobservable determinants of hours. For example, unobserved variables such as individual ability, motivation, or personality are likely correlated with wages and correlated with the unobserved tastes for work ( $ε_{h}$ ).
Simultaneous Equation Bias: Equilibrium wages and equilibrium hours are likely determined jointly within a system of demand and supply. Wage is an endogenous variable and likely correlated with the error term $ε_{h}$ in model (2). Generally, the direction of the bias in the OLS estimator for the labour supply function will be difficult to determine. However, if the labour supply curve is upward sloping $(β_{1} > 0$ ) and the labour demand function is downward sloping ( $α_{1} < 0$ ), we would expect the OLS estimate of $β_{1}$ in the labour supply function to be biased downward.

If the condition $COV (lnwage, ε_{h} | X) \neq 0$ , the OLS estimator will be biased and inconsistent - the bias does not disappear in large enough samples. The OLS estimators for all of the parameters in model (2) are biased, not just the estimator for the parameters associated with the endogenous variable $ln wage$

Figure 5: Wald Test of Hypotheses

H_{0} : π_{h 5} = π_{h 6} = π_{h 7} = 0

Figure 6: Wald Test of Hypotheses

H_{0} : π_{h 1} = π_{h 5} = π_{h 6} = π_{h 7} = 0

Figure 7: Wald Test of Hypotheses

H_{0} : π_{h 1} = π_{h 2} = π_{h 3} = 0

b)

Clearly explain whether the labour demand equation (1) satisfies the necessary condition for identification. Why or why not?
Using the information contained in Figure 5, Figure 6, Figure 7, or Figure 8, test the hypothesis that the necessary condition(s) for identification of the labour demand function (1) are satisfied, at the 5% level of significance.
Your answer should clearly state the null and alternative hypotheses, the distribution of the test statistic, and your conclusion.

Solution

Aside: Recall that the 2SLS estimator for the labour demand function replaces the endogenous variable $hours$ with the predicted values from the reduced form for $\hat{hours}$ . Suppose that the variables excluded from the demand function $(childlt6$ , $childge6$ , and $nlinc$ ) were jointly zero so $π_{h 5} = π_{h 6} = π_{h 7} = 0$ in the reduced form for $hours$ . The predicted values become: ${\hat{hours}}_{i} = {\hat{π}}_{h 0} + {\hat{π}}_{h 1} {educ}_{i} + {\hat{π}}_{h 2} {exper}_{i} + {\hat{π}}_{h 3} {exper}_{i}^{2} + {\hat{π}}_{h 4} {union}_{i}$ In this case, the estimated structural demand equation (1) would be: $\begin{aligned} {lnwage}_{i} = & α_{0} + α_{1} {\hat{hours}}_{i} + α_{2} {educ}_{i} + α_{3} {exper}_{i} + α_{4} {exper}_{i}^{2} + α_{5} {union}_{i} + ε_{w i} \\ = & α_{0} + α_{1} {{\hat{π}}_{h 0} + {\hat{π}}_{h 1} {educ}_{i} + {\hat{π}}_{h 2} {exper}_{i} + {\hat{π}}_{h 3} {exper}_{i}^{2} + {\hat{π}}_{h 4} {union}_{i}} \\ + & α_{2} {educ}_{i} + α_{3} {exper}_{i} + α_{4} {exper}_{i}^{2} + α_{5} {union}_{i} + ε_{w i} \\ = & (α_{0} + α_{1} {\hat{π}}_{h 0}) + (α_{2} + α_{1} {\hat{π}}_{h 1}) {educ}_{i} + (α_{3} + α_{1} {\hat{π}}_{h 2}) {exper}_{i} \\ + & (α_{4} + α_{1} {\hat{π}}_{h 3}) {exper}_{i}^{2} + (α_{5} + α_{1} {\hat{π}}_{h 4}) {union}_{i} + ε_{w i} \\ = & γ_{0} + γ_{1} {educ}_{i} + γ_{2} {exper}_{i} + γ_{3} {exper}_{i}^{2} + γ_{4} {union}_{i} + ε_{w i} \end{aligned}$

where: $\begin{aligned} γ_{0} & = α_{0} + α_{1} {\hat{π}}_{h 0} \\ γ_{1} & = α_{2} + α_{1} {\hat{π}}_{h 1} \\ γ_{2} & = α_{3} + α_{1} {\hat{π}}_{h 2} \\ γ_{3} & = α_{4} + α_{1} {\hat{π}}_{h 3} \\ γ_{4} & = α_{5} + α_{1} {\hat{π}}_{h 4} \end{aligned}$

Only the parameters $γ_{0}$ , $γ_{1}$ , $γ_{2}$ , $γ_{3}$ , and $γ_{4}$ are (econometrically) identified. We are unable to obtain separate estimates of the structural parameters of interest $α_{0}$ , $α_{1}$ , $α_{2}$ , $α_{3}$ , $α_{4}$ , and $α_{5}$ . Importantly, an estimate of $α_{1}$ , which is related to the slope of the labour demand function, cannot be obtained.

Now consider the case where the reduced form coefficients on the variables excluded from the demand function (that is $π_{h 5}$ , $π_{h 6}$ , and $π_{h 6}$ ) are not identically zero but jointly insignificant. The estimated 2SLS coefficients in the structural demand function will be imprecisely estimated and only weakly identified

Consider the joint test of the null hypothesis $H_{0} : π_{h 5} = π_{h 6} = π_{h 7} = 0$ against the alternative hypothesis that at least one of the coefficients on these variables is non-zero.
The test statistic will follow a F distribution with $3$ numerator degrees of freedom and $(2, 867 - 8) = 2, 859$ denominator degrees of freedom.
The decision rule is to reject $H_{0}$ if the sample value of the test statistic exceeds some critical value $F_{c}$ . Alternatively, at the 5%, reject $H_{0}$ if the p value of the sample test statistic exceeds 0.05.
Figure 5 provides the value of the F test statistic for the null hypothesis that all of the excluded variables are jointly insignificant in the reduced form for $hours$ . The value of the test statistic is 30.247 with a p value of 0.0000. Reject the null hypothesis.
In practice, the demand equation is identified in this example.

c)

Clearly explain whether the labour supply equation (2) satisfies the necessary condition for identification. Why or why not?
Using the information contained in Figure 5, Figure 6, Figure 7, or Figure 8, test the hypothesis that the necessary condition(s) for identification of the labour supply function (2) are satisfied, at the 5% level of significance. Your answer should clearly state the null and alternative hypotheses, the distribution of the test statistic, and your conclusion.

Solution

Now consider the case where the reduced form coefficients on the variables excluded from the demand function (that is $π_{w 2}$ , $π_{w 3}$ , and $π_{w 4}$ ) are not identically zero but jointly insignificant. The estimated 2SLS coefficients in the structural supply function will be imprecisely estimated and only weakly identified.

Aside : Recall that the 2SLS estimator for the labour supply function replaces the endogenous variable $lnwage$ with the predicted values from the reduced form for $\hat{lnwage}$ . Suppose that the variables excluded from the supply function ( $exper$ , $expersq$ , and $union$ ) were jointly zero so $π_{w 2} = π_{w 3} = π_{w 4} = 0$ . The predicted values for the reduced form for $lnwage$ become: ${\hat{lnwage}}_{i} = {\hat{π}}_{w 0} + {\hat{π}}_{w 1} {educ}_{i} + {\hat{π}}_{w 5} {childlt6}_{i} + {\hat{π}}_{w 6} {childge6}_{i} + {\hat{π}}_{w 7} {nlinc}_{i}$ In this case, the estimated structural supply equation (2) would be: $\begin{aligned} {hours}_{i} & = β_{0} + β_{1} {\hat{lnwage}}_{i} + β_{2} {educ}_{i} + β_{3} {childlt6}_{i} + β_{4} {childge6}_{i} + β_{5} {nlinc}_{i} + ε_{h i} \\ = β_{0} + β_{1} {{\hat{π}}_{w 0} + {\hat{π}}_{w 1} {educ}_{i} + {\hat{π}}_{w 5} {childlt6}_{i} + {\hat{π}}_{w 6} {childge6}_{i} + {\hat{π}}_{w 7} {nlinc}_{i}} \\ + β_{2} {educ}_{i} + β_{3} {childlt6}_{i} + β_{4} {childge6}_{i} + β_{5} {nlinc}_{i} + ε_{h i} \\ = (β_{0} + β_{1} {\hat{π}}_{w 0}) + (β_{2} + β_{1} {\hat{π}}_{w 1}) {educ}_{i} + (β_{3} + β_{1} {\hat{π}}_{w 5}) {childlt6}_{i} \\ + (β_{4} + β_{1} {\hat{π}}_{w 6}) {childge6}_{i} + (β_{5} + β_{1} {\hat{π}}_{w 7}) {nlinc}_{i} + ε_{h i} \\ = δ_{0} + δ_{1} {educ}_{i} + δ_{2} {childlt6}_{i} + δ_{3} {childge6}_{i} + δ_{4} {nlinc}_{i} + ε_{h i} \end{aligned}$ where: $\begin{aligned} δ_{0} & = β_{0} + β_{1} {\hat{π}}_{w 0} \\ δ_{1} & = β_{2} + β_{1} {\hat{π}}_{w 1} \\ δ_{2} & = β_{3} + β_{1} {\hat{π}}_{w 5} \\ δ_{3} & = β_{4} + β_{1} {\hat{π}}_{w 6} \\ δ_{4} & = β_{5} + β_{1} {\hat{π}}_{w 7} \end{aligned}$ Only the parameters $δ_{0}$ , $δ_{1}$ , $δ_{2}$ , $δ_{3}$ , and $δ_{4}$ are (econometrically) identified. We are unable to obtain separate estimates of the structural parameters of interest $β_{0}$ , $β_{1}$ , $β_{2}$ , $β_{3}$ , $β_{4}$ , and $β_{5}$ .
Importantly, an estimate of $β_{1}$ , which is related to the slope of the labour supply function, cannot be obtained.

Consider the joint test of the null hypothesis $H_{0} : π_{w 2} = π_{w 3} = π_{w 4} = 0$ against the alternative hypothesis that at least one of the coefficients on these variables is non-zero. The test statistic will follow a F distribution with $3$ numerator degrees of freedom and $(2, 867 - 8) = 2, 859$ denominator degrees of freedom.
The decision rule is to reject $H_{0}$ if the sample value of the test statistic exceeds some critical value $F_{c}$ . Alternatively, at the 5%, reject $H_{0}$ if the p value of the sample test statistic exceeds 0.05.
8 provides the value of the F test statistic for the null hypothesis that all of the excluded variables are jointly insignificant in the reduced form for $\ln wage$ . The value of the test statistic is 50.925 with a p value of 0.0000. Reject the null hypothesis. In practice, the labour supply equation is identified in this example.

d)

The labour supply equation (2) was estimated by the method of Two-Stage Least Squares (2SLS) and the results are reported in Figure 9.

Figure 9: 2SLS Regression REsults (with robust standard errors) for Model 2

What is the interpretation of the parameter estimate ${\hat{β}}_{3}$ in Figure 9.

Solution

The variable $childlt6$ is an indicator (dummy) variable such that: $β_{3} = E [hours | childlt6 = 1, X] - E [hours | childlt6 = 0, X]$ so $β_{3}$ is the average difference in hours of work for partnered women with (at least) one child under 6 in their household, relative to partnered women who do not have (at least) one child under 6 in their household, controlling for hourly wage, education, and non-labour income.
The estimate ${\hat{β}}_{3} = - 3.0832$ implies that, on average, partnered women with (at least) one child under 6 in their household work approximately three (3) hours less per week, relative to partnered women who do not have (at least) one child under 6 in their household, controlling for hourly wage, education, and non-labour income.