This tutorial reviews some concepts for the basic linear model, using the econometrics software package R. Specifically, the tutorial reviews:
estimation and interpretation in econometric models using the method of Instrumental Variables (IV) in R
estimation and interpretation in econometric models using the method of Two-Stage Least Squares (2SLS) in R
This tutorial requires one (1) data file:
fultonfish.csv
This file can be obtained from the Canvas subject page.
In addition, the file, tut11.R , provides the program code (R script file) necessary to complete this titprial. The R script file uses the following packages which need to be installed prior to running this program:
stargazer :
for easily generating summary statistics for an R data file
ggplot2 :
for eaily porducing grpahs in R
car:
for easily conducting hypothesis tests in R
lmtest :
for easily conducting the Ramsey RESET test in R
sandwich:
for easily calculating robust standard errors in R
rio:
for easily importing data into R
sandwich:
for easily calculating robust (Huber-White) heteroskedasticty consistent standard errors in R
AER:
for easily estimating models by the method of Instrumental Variables (IV) in R
These can be installed directly in RStudio from the packages tab or by using the command install.packages() and inserting the name of the package in the brackets.
Question 1
[Hill,Griffiths,Lim 5th Ed., pp 542-544]
This question reviews example 11.2 in the textbook.
Hint: In all regressions, use the White (Huber-White) standard errors.
Consider the following daily demand function for whiting (fish) at the Fulton fish market:
where represents the quantity sold (in pounds), represents the average daily price per pound.
The remaining variables are indicator variables for the days of the workweek, with Friday as the omitted category. The supply equation is:
The variable is an indicator variable denoting stormy weather in the previous three days. We expect to be negative - stormy weather reduces the supply of fish brought to the market.
The data file fultonfish.csv contains daily observations on the price of whiting, the quantity sold, and weather conditions, from December 2 1991 until May 8, 1992. ::: column-margin
(a)
Does the demand equation equation satisfy the necessary condition for identification. Why or why not?
Solution
There are endogenous variables ( and ) so the necessary condition for identification requires at least variables be excluded from the demand function. The variable stromy is omitted from the demand function so it satisfies this condition for identification.
Stormy conditions shift the supply function, relative to a fixed demand function (since it does not contain the stormy variable), tracing out the demand curve.
(b)
Does the supply equation equation satisfy the necessary condition for identification. Why or why not?
Solution
There are endogenous variables ( and ) so the necessary condition for identification requires at least variables be excluded from the supply function.
The day of the week indicator variables are omitted from the supply function so the necessary condition for identification is satisfied.
The demand function shifts daily around a fixed supply curve (since it does not contain the day of the week controls), tracing out the supply curve.
(c)
The reduced-form equations for the demand-supply system are given by:
Estimate these reduced forms by OLS.
Run the following :R code given in tut11.R (click on the Code button to see and/or copy and paste the R code chunk):
Code
#------------------------------------------# Reduced Form for lnq, with robust standard errors#---------------------------------------# read data file into Rfish <-import("fultonfish.csv")# endogenous variables : lnprice, lnquant# exogenous variables: days of the week: {mon, tue, wed, thu}# exogenous variable: stormy reduced_lnq <-lm(lnquan ~ mon + tue + wed + thu + stormy, data = fish)# print(summary(reduced_lnq))demdf10 <-df.residual(reduced_lnq)# Robust standard errors# Adjust standard errors using sandwich packagecov5 <-vcovHC(reduced_lnq, type ="HC1")reduced_lnq_se_r <-coeftest(reduced_lnq, vcov=cov5)# print(reduced_lnq_se_r)reduced_lnq_robust <-sqrt(diag(cov5)) # OLS estimates with robust standard errors# Adjust F statistic wald_reduced_lnq_r <-waldtest(reduced_lnq, vcov = cov5) # Sample F test statistic using cov5 varcov matrix# print(wald_reduced_lnq_r)fstat10 <-round(wald_reduced_lnq_r$"F"[2], digits=4) # Sample value of F statpvalf10 <-round(wald_reduced_lnq_r$'Pr(>F)'[2], digits=4) # p value of F testnumdf10 <-abs(wald_reduced_lnq_r$"Df"[2]) #-----------------------------------------------# Reduced Form for lnp, with robust standard errors#--------------------------------------------reduced_lnp <-lm(lnprice ~ mon + tue + wed + thu + stormy, data = fish)# print(summary(reduced_lnp))demdf12 <-df.residual(reduced_lnp)# Robust standard errors# Adjust standard errors using sandwich packagecov6 <-vcovHC(reduced_lnp, type ="HC1")reduced_lnp_se_r <-coeftest(reduced_lnp, vcov=cov6)# print(reduced_lnp_se_r)reduced_lnp_robust <-sqrt(diag(cov6)) # OLS estimates with robust standard errors# Adjust F statistic wald_reduced_lnp_r <-waldtest(reduced_lnp, vcov = cov6) # Sample F test statistic using cov6 varcov matrix# print(wald_reduced_lnp_r)fstat12 <-round(wald_reduced_lnp_r$"F"[2], digits=4) # Sample value of F statpvalf12 <-round(wald_reduced_lnp_r$'Pr(>F)'[2], digits=4) # p value of F testnumdf12 <-abs(wald_reduced_lnp_r$"Df"[2]) #------------------------------------
The reduced form supply and demand equations are reported below:
Code
stargazer(reduced_lnq,reduced_lnp, type ="html", dep.var.labels=c("(Log) Quantity", "(Log) Price"),covariate.labels=c("Intercept", "Monday", "Tuesday", "Wednesday", "Thursday", "Stormy"), column.labels =c("(Robust)", "(Robust)"),se =list(reduced_lnq_robust, reduced_lnp_robust),omit.stat ="f",add.lines =list(c("F Statistic", fstat10, fstat12),c("F p value", pvalf10, pvalf12),c("F num df", numdf10, numdf12),c("F dem df", demdf10, demdf12)),digits=4, align=TRUE,intercept.bottom=FALSE,star.cutoffs =c(0.05, 0.01, 0.001))
Dependent variable:
(Log) Quantity
(Log) Price
(Robust)
(Robust)
(1)
(2)
Intercept
8.8101***
-0.2717**
(0.1174)
(0.0952)
Monday
0.1010
-0.1129
(0.1978)
(0.1154)
Tuesday
-0.4847*
-0.0411
(0.1939)
(0.1164)
Wednesday
-0.5531**
-0.0118
(0.1986)
(0.1135)
Thursday
0.0537
0.0496
(0.1720)
(0.1156)
Stormy
-0.3878**
0.3464***
(0.1426)
(0.0723)
F Statistic
5.9278
5.8996
F p value
0.0001
0.0001
F num df
5
5
F dem df
105
105
Observations
111
111
R2
0.1934
0.1789
Adjusted R2
0.1550
0.1398
Residual Std. Error (df = 105)
0.6818
0.3542
Note:
p<0.05; p<0.01; p<0.001
Figure 1: Reduced Forms for and
i)
Test the hypothesis that necessary conditions for identification of the supply function are satisfied.
Solution
Aside: Recall that for the 2SLS estimator we replace the endogenous variable with the predicted values from the reduced form for .
Suppose that the day of the week indicator variables were jointly zero so . The predicted values become:
But if we replace with in the supply equation there will be exact collinearity. The estimated supply equation would be:
where and .
So only estimates of and are identified. We are unable to obtain separate estimates of the structural parameters and .
Now if the coefficients on the day of the week variables (that is, , , , and in the reduced form for ) are not identically zero but jointly insignificant, there will be a situation of almost exact collinearity. The estimated 2SLS coefficients and (in the structural supply function) will be imprecisely estimated and only weakly identified.
For the reduced form for consider the null hypothesis:
against the alternative that at least one of these parameters is non-zero. The test statistic will follow a F-distribution with (M,N-K-1) degrees of freedom.
Here the number of restrictions and the model degrees of freedom is .
The F critical value .
The decision rule - reject if the sample value of F test statistic exceeds the critical value.
Alternatively, reject if the p-value for the sample value of the test statistic is less than .
Code
#------------------------------------# Is Supply Equation Identified?# Test Days of Week in reduced form for lnphnull_1 <-c("mon=0", "tue = 0", "wed=0", "thu=0")linearHypothesis(reduced_lnp, hnull_1, vcov=cov6)
Linear hypothesis test:
mon = 0
tue = 0
wed = 0
thu = 0
Model 1: restricted model
Model 2: lnprice ~ mon + tue + wed + thu + stormy
Note: Coefficient covariance matrix supplied.
Res.Df Df F Pr(>F)
1 109
2 105 4 0.7426 0.5651
Figure 2: F test dfor significance of Day of Week Indicators in Reduced Form:
The output provides a sample F-statistic of with a p-value of .
Since the p-value is larger than the desired level of significance we do not reject the null hypothesis.
Even if we could reject our rule of thumb requires a value for the F test statistic of at least 10 in order to avoid the weak identification problem.
In practice, the supply equation is not identified in this example.
ii)
Test the hypothesis that necessary conditions for identification of the demand function are satisfied.
Solution
Aside: Recall that for the 2SLS estimator we replace the endogenous variable with the predicted values from the reduced form for . Suppose that the stormy weather variable was zero so .
The predicted values become:
But if we replace with in the demand equation there will be exact collinearity. The estimated demand equation would be:
where:
So only estimates of , , , , and are identified. We are unable to obtain separate estimate of the structural parameter of interest .
Now if the coefficient on the stormy variable (that is, in the reduced form for ) is not identically zero but statistically insignificant, there will be a situation of almost exact collinearity.
The estimated 2SLS coefficient of interest (in the structural demand equation) will be imprecisely estimated and only weakly identified.
The requirement for identification is that the coefficient on the variable stormy be statistically significant in the reduced form for . In this case, either a t-test or a F-test is sufficient. For the reduced form for consider the null hypothesis:
against the alternative that at least one of these parameters is non-zero.
The test statistic will follow a F-distribution with (M,N-K-1) degrees of freedom. Here the number of restrictions and the model degrees of freedom is .
The F critical value .
The decision rule - reject if the sample value of the F test statistic exceeds the critical value. Alternatively, reject if the p-value for the sample value of the test statistic is less than .
Code
#------------------------------------# Is Demand Equation Identified?# Test stormy in reduced form for lnp# can do simple t-test or F test# print(summary(reduced_lnp))hnull_2 <-c("stormy=0")linearHypothesis(reduced_lnp, hnull_2, vcov=cov6)
Figure 3: F Test for Significance of Stormy Weather in Reduced Form:
The R output from the Wald test of:
is given in Figure3.
The output provides a sample F-statistic of with a p-value of . Since the p-value is less than the desired level of significance we reject the null hypothesis.
The variable is statistically significant in the reduced form for and the demand function is `practically’ identified.
Alternatively consider a two-sided t-test of the null hypothesis .
The test statistic will follow a t-distribution with (N-K-1) degrees of freedom. Here the degrees of freedom are .
The t critical value .
The decision rule - reject if or .
Alternatively, reject if the p-value for the sample value of the test statistic is less than .
The sample value of the test statistic (using the robust standard error) is calculated as:
Since we reject the null hypothesis. Based on the sample value of the test statistic, we would also reject the null hypothesis against the alternative .
Based on the OLS results, the sample evidence is consistent with an estimated elasticity of demand that is inelastic (an absolute value between 0 and 1).
d)
Estimate the demand equation by 2SLS. Comment on the magnitude of the estimated price elasticity.
Solution
Run the following code
Code
demand_ols <-lm(lnquan ~ lnprice + mon + tue + wed+ thu, data=fish)# print(summary(demand_ols))demdf8 <-df.residual(demand_ols)# Robust standard errors# Adjust standard errors using sandwich packagecov4 <-vcovHC(demand_ols, type ="HC1")demand_ols_se_r <-coeftest(demand_ols, vcov=cov4)# print(demand_ols_se_r)demand_ols_robust <-sqrt(diag(cov4)) # OLS estimates with robust standard errors# Adjust F statistic wald_demand_ols_r <-waldtest(demand_ols, vcov = cov4, test="F") # Sample F test statistic using cov4 varcov matrix# print(wald_demand_ols_r)fstat8 <-round(wald_demand_ols_r$"F"[2], digits=4) # Sample value of F statpvalf8 <-round(wald_demand_ols_r$'Pr(>F)'[2], digits=4) # p value of F testnumdf8 <-abs(wald_demand_ols_r$"Df"[2]) demand_iv <-ivreg(lnquan ~ lnprice + mon + tue + wed + thu | mon + tue + wed + thu + stormy, data=fish)# print(summary(demand_iv))demdf4 <-df.residual(demand_ols)# Robust standard errors# Adjust standard errors using sandwich packagecov4 <-vcovHC(demand_iv, type ="HC1")demand_iv_se_r <-coeftest(demand_iv, vcov=cov4)# print(demand_iv_se_r)demand_iv_robust <-sqrt(diag(cov4)) # IV estimates with robust standard errors# Adjust F statistic wald_demand_iv_r <-waldtest(demand_iv, vcov = cov4, test="F") # Sample F test statistic using cov7 varcov matrix# print(wald_demand_iv_r)fstat4 <-round(wald_demand_iv_r$"F"[2], digits=4) # Sample value of F statpvalf4 <-round(wald_demand_iv_r$'Pr(>F)'[2], digits=4) # p value of F testnumdf4 <-abs(wald_demand_iv_r$"Df"[2])###################### 2SLS for Demand Equation, with robust standard errors# lnp is endogenous# Instrument Set: mon tue wed thu stormy#--------------------------------------demand_iv <-ivreg(lnquan ~ lnprice + mon + tue + wed + thu | mon + tue + wed + thu + stormy, data=fish)# print(summary(demand_iv))demdf14 <-df.residual(demand_ols)# Robust standard errors# Adjust standard errors using sandwich packagecov7 <-vcovHC(demand_iv, type ="HC1")demand_iv_se_r <-coeftest(demand_iv, vcov=cov7)# print(demand_iv_se_r)demand_iv_robust <-sqrt(diag(cov7)) # IV estimates with robust standard errors# Adjust F statistic wald_demand_iv_r <-waldtest(demand_iv, vcov = cov7, test="F") # Sample F test statistic using cov7 varcov matrix# print(wald_demand_iv_r)fstat14 <-round(wald_demand_iv_r$"F"[2], digits=4) # Sample value of F statpvalf14 <-round(wald_demand_iv_r$'Pr(>F)'[2], digits=4) # p value of F testnumdf14 <-abs(wald_demand_iv_r$"Df"[2])
Code
stargazer(demand_ols,demand_iv, type ="html", dep.var.labels=c("(Log) Quantity"),covariate.labels=c("Intercept", "(Log) Price", "Monday","Tuesday", "Wednesday", "Thursday"), column.labels =c("(Robust)", "(Robust)"),se =list(demand_ols_robust, demand_iv_robust),omit.stat ="f",add.lines =list(c("F Statistic", fstat8, fstat14),c("F p value", pvalf8, pvalf14),c("F num df", numdf8, numdf14),c("F dem df", demdf8, demdf14)),digits=4, align=TRUE,title ="Figure 4: Two Stage Least Squares Results for the Demand Function",intercept.bottom=FALSE,star.cutoffs =c(0.05, 0.01, 0.001))
Figure 4: Two Stage Least Squares Results for the Demand Function
Dependent variable:
(Log) Quantity
OLS
instrumental
variable
(Robust)
(Robust)
(1)
(2)
Intercept
8.6069***
8.5059***
(0.1183)
(0.1521)
(Log) Price
-0.5625***
-1.1194*
(0.1522)
(0.4432)
Monday
0.0143
-0.0254
(0.2057)
(0.2214)
Tuesday
-0.5162**
-0.5308**
(0.1897)
(0.2021)
Wednesday
-0.5554**
-0.5664**
(0.1937)
(0.2069)
Thursday
0.0816
0.1093
(0.1620)
(0.1784)
F Statistic
9.4007
4.7195
F p value
0
0.0006
F num df
5
5
F dem df
105
105
Observations
111
111
R2
0.2205
0.1391
Adjusted R2
0.1834
0.0981
Residual Std. Error (df = 105)
0.6702
0.7043
Note:
p<0.05; p<0.01; p<0.001
The OLS estimation results (with robust standard errors) provide an estimate of the price elasticity of demand of -0.5625 so a 1% increase in price is associated with a 0.56% decrease in quantity demanded. However, in the presence of simultaneity bias, the OLS estimator will be both biased and inconsistent. Consider the null hypothesis against the alternative . The test statistic will follow a t-distribution with (N-K-1) degrees of freedom. Here the degrees of freedom are . The t critical value . The decision rule—reject if or . Alternatively, reject if the p-value for the sample value of the test statistic is less than .
The sample value of the test statistic (using the robust standard error) is calculated as: Since we reject the null hypothesis. Based on the sample value of the test statistic, we would also reject the null hypothesis against the alternative . Based on the OLS results, the sample evidence is consistent with an estimated elasticity of demand that is inelastic (an absolute value between 0 and 1).
The 2SLS estimation results are reported in Figure 4. The estimated coefficient on represents an elasticity so a 1% increase in price is associated with a 1.12% decrease in quantity demanded.
The p-value for a two-sided t-test about zero is 0.0130 so the estimate is statistically significant at the 5% level of significance. Consider the null hypothesis against the alternative . The test statistic will follow a t-distribution with (N-K-1) degrees of freedom. Here the degrees of freedom are . The t critical value . The decision rule - reject if or . Alternatively, reject if the p-value for the sample value of the test statistic is less than .
The sample value of the test statistic (using the robust standard error) is calculated as: Since we do not reject the null hypothesis. Based on the IV results, the sample evidence is consistent with an estimated elasticity of demand that is unitary elastic (an absolute value of exactly 1). Based on the sample value of the test statistic, we would also not reject the null hypothesis against the alternative .
Since the OLS estimator of the price elasticity of demand () will generally be upward biased (less negative) as a result of simultaneity bias, we would make different conclusions regarding the elasticity of demand whether we ignore this simultaneity bias (OLS) or use the 2SLS estimator.
Question 2
The labour market outcomes for partnered women are of great interest to governments, economists, financial planners, and many other interested players in the economy. Consider the following labour demand equation for partnered women: where: Consider the following labour supply function for partnered women: where: The reduced form equations for this system are given by: You have available a sample containing observations that includes data on all the variables defined above.
a)
Consider the labour supply function (2). Do you thins the condition is likely to be satisfied?
Clearly explain why or why not. Outline three possible reasons why this condition might not be satisfied. Explain the consequences for the OLS estimator if this condition is not satisfied.
Solution
The condition might not be satisfied as a result of:
Measurement Error: Measurement error in wages will induce a correlation between (observed) wages and the unobservable determinants of hours. Our prior is that the labour supply curve is upward sloping (when the substitution effect dominates the income effect) so . In the case of classical measurement error, the OLS estimates of will be downward biased towards zero.
Omitted Variable Bias: There are likely (omitted) variables that are both correlated with wages and the unobservable determinants of hours. For example, unobserved variables such as individual ability, motivation, or personality are likely correlated with wages and correlated with the unobserved tastes for work ().
Simultaneous Equation Bias: Equilibrium wages and equilibrium hours are likely determined jointly within a system of demand and supply. Wage is an endogenous variable and likely correlated with the error term in model (2). Generally, the direction of the bias in the OLS estimator for the labour supply function will be difficult to determine. However, if the labour supply curve is upward sloping ) and the labour demand function is downward sloping (), we would expect the OLS estimate of in the labour supply function to be biased downward.
If the condition , the OLS estimator will be biased and inconsistent - the bias does not disappear in large enough samples. The OLS estimators for all of the parameters in model (2) are biased, not just the estimator for the parameters associated with the endogenous variable
Figure 5: Wald Test of Hypotheses
Figure 6: Wald Test of Hypotheses
Figure 7: Wald Test of Hypotheses
b)
Clearly explain whether the labour demand equation (1) satisfies the necessary condition for identification. Why or why not?
Using the information contained in Figure 5, Figure 6, Figure 7, or Figure 8, test the hypothesis that the necessary condition(s) for identification of the labour demand function (1) are satisfied, at the 5% level of significance.
Your answer should clearly state the null and alternative hypotheses, the distribution of the test statistic, and your conclusion.
Solution
Aside: Recall that the 2SLS estimator for the labour demand function replaces the endogenous variable with the predicted values from the reduced form for . Suppose that the variables excluded from the demand function , , and ) were jointly zero so in the reduced form for . The predicted values become: In this case, the estimated structural demand equation (1) would be:
where:
Only the parameters , , , , and are (econometrically) identified. We are unable to obtain separate estimates of the structural parameters of interest , , , , , and . Importantly, an estimate of , which is related to the slope of the labour demand function, cannot be obtained.
Now consider the case where the reduced form coefficients on the variables excluded from the demand function (that is , , and ) are not identically zero but jointly insignificant. The estimated 2SLS coefficients in the structural demand function will be imprecisely estimated and only weakly identified
Consider the joint test of the null hypothesis against the alternative hypothesis that at least one of the coefficients on these variables is non-zero.
The test statistic will follow a F distribution with numerator degrees of freedom and denominator degrees of freedom.
The decision rule is to reject if the sample value of the test statistic exceeds some critical value . Alternatively, at the 5%, reject if the p value of the sample test statistic exceeds 0.05.
Figure 5 provides the value of the F test statistic for the null hypothesis that all of the excluded variables are jointly insignificant in the reduced form for . The value of the test statistic is 30.247 with a p value of 0.0000. Reject the null hypothesis.
In practice, the demand equation is identified in this example.
c)
Clearly explain whether the labour supply equation (2) satisfies the necessary condition for identification. Why or why not?
Using the information contained in Figure 5, Figure 6, Figure 7, or Figure 8, test the hypothesis that the necessary condition(s) for identification of the labour supply function (2) are satisfied, at the 5% level of significance. Your answer should clearly state the null and alternative hypotheses, the distribution of the test statistic, and your conclusion.
Solution
Now consider the case where the reduced form coefficients on the variables excluded from the demand function (that is , , and ) are not identically zero but jointly insignificant. The estimated 2SLS coefficients in the structural supply function will be imprecisely estimated and only weakly identified.
Aside : Recall that the 2SLS estimator for the labour supply function replaces the endogenous variable with the predicted values from the reduced form for . Suppose that the variables excluded from the supply function (, , and ) were jointly zero so . The predicted values for the reduced form for become: In this case, the estimated structural supply equation (2) would be: where: Only the parameters , , , , and are (econometrically) identified. We are unable to obtain separate estimates of the structural parameters of interest , , , , , and .
Importantly, an estimate of , which is related to the slope of the labour supply function, cannot be obtained.
Consider the joint test of the null hypothesis against the alternative hypothesis that at least one of the coefficients on these variables is non-zero. The test statistic will follow a F distribution with numerator degrees of freedom and denominator degrees of freedom.
The decision rule is to reject if the sample value of the test statistic exceeds some critical value . Alternatively, at the 5%, reject if the p value of the sample test statistic exceeds 0.05.
8 provides the value of the F test statistic for the null hypothesis that all of the excluded variables are jointly insignificant in the reduced form for . The value of the test statistic is 50.925 with a p value of 0.0000. Reject the null hypothesis. In practice, the labour supply equation is identified in this example.
d)
The labour supply equation (2) was estimated by the method of Two-Stage Least Squares (2SLS) and the results are reported in Figure 9.
Figure 9: 2SLS Regression REsults (with robust standard errors) for Model 2
What is the interpretation of the parameter estimate in Figure 9.
Solution
The variable is an indicator (dummy) variable such that: so is the average difference in hours of work for partnered women with (at least) one child under 6 in their household, relative to partnered women who do not have (at least) one child under 6 in their household, controlling for hourly wage, education, and non-labour income.
The estimate implies that, on average, partnered women with (at least) one child under 6 in their household work approximately three (3) hours less per week, relative to partnered women who do not have (at least) one child under 6 in their household, controlling for hourly wage, education, and non-labour income.