7 Utilizing dummy variables
A single dummy variable on th RHS was introduced in the simple example (2.1)
If qualitative variable has two categories (levels) only one dummy variable should be included in the model with constant term
-> 0 indicates the absence of the category
-> 1 indicates the presence of the category
If qualitative variable has more than two categories (\(m>2\)) the number of dummy variables should be for one less then the number of categories (\(m-1\))
Otherwise the system of normal equations obtained by the least squares method (5.14) has no solution (due to liner dependence of the RHS variables, so called perfect multicollinearity problem)
When dependent variable \(y\) is dummy then linear probability models \(LPM\) should be applied (these models will not be considered in this course)
In econometric analysis dummy variables on the RHS are most commonly used to describe:
- Changes in the constant term only
- Changes in the slope coefficient only
- Changes in the constant term and the slope coefficient
- Interaction effect
- Seasonal component of a time-series (seasonal dummy variables)
These changes in parameters, e.g. the slope coefficient is not same for all observations, may be helpful in capturing nonlinear dependence between variables.
Nonlinear dependence usually indicates that parameters are not constant, and inclusion of dummy variables makes it easier to interpret nonlinear dependence with respect to characteristic groups of observations.
Dependence of income (variable \(y\) in USD) on gender (variable \(d\)) and working experince (variable \(x\) in years) is analyzed. Explain the differences between given models! \[a)~~~\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1d_i~~~~~~~~~~~~~~~~~~~~~~~~~b)~~\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1d_i+\hat{\beta}_2x_i~~~~~~~~~~~~~~~~~~\] \[c)~~\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1(d_ix_i)+\hat{\beta}_2x_i~~~~~~~d)~~\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1d_i+\hat{\beta}_2x_i+\hat{\beta}_3(d_ix_i)\]
Dependence of income (variable \(y\) in USD) on gender (variable \(d_1\)) and working sector (variable \(d_2\)) is analyzed, considering \[d_1=\left\{\begin{array}{cl} 1,& if~male\\0,& if~female\end{array}\right.~~,~~~~~~~~~d_2=\left\{\begin{array}{cl} 1,& if~working~in~private~sector\\0,& if~working~in~public~sector\end{array}\right.\]
What changes can be explained by given models?
\[a)~~~\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1d_{i,1}+\hat{\beta}_2d_{i,2}+\hat{\beta}_3(d_{i,1}d_{i,2})~~~~~~~~~~~~~~~~~~~\] \[b)~~\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1x_i+\hat{\beta}_2(x_id_{i,1})+\hat{\beta_3}(x_id_{i,2})+\hat{\beta}_4(d_{i,1}d_{i,2})\]
Import the file wages.txt to R Studio (import the text file directly to R Studio using link address or download it from the course site and load it from your working directory). Save a data frame as an object wagedata
. Given data frame presents for 394 workers (\(n=394\)) with respect to: wage
(monthly wage in USD), gender
(1 if male and 0 if female), experience
(1 if years of experience are at least 20 and 0 if years of experience are less than 20) and iq
score.
- Estimate a model to examine . Save estimated model as a new object
model_1
.
=read.table(file="http://www.efzg.hr/
wagedatauserdocsimages/sta/jarneric/wages.txt",header=TRUE)
head(wagedata)
=lm(iq~gender,wagedata)
model_1summary(model_1)
Continued …
What is the expected iq score for female workers and male workers? Obtain the results from
model_1
object.Is the difference in expected (average) iq score between males and females (significance level \(\alpha=0.05\))?
Using the same data as in exercise 7.3 estimate a model to . Save estimated model as a new object model_2
.
=lm(wage~gender,wagedata)
model_2summary(model_2)
- Is the difference in average (expected) monthly wage between males and females (\(\alpha=0.05\))? Use a (requires installation of
car
package).
Continued …
install.packages("car")
library(car)
linearHypothesis(model_2,c("gender=500"),test="Chisq")
- Estimate a model to . Save estimated model as a new object
model_3
. Use a .
=lm(wage~gender+experience+gender:experience,wagedata)
model_3summary(model_3)
linearHypothesis(model_3,c("experience=gender"),test="Chisq")
- Is between two qualitative variables statistically significant (significance level \(\alpha=0.05\))?
Using the same data as in previous exercises estimate a model to allow for possibility that the . Save estimated model as a new object model_4
.
=lm(wage~iq+iq:gender,wagedata)
model_4summary(model_4)
- Is the difference in slope coefficients statistically significant (\(\alpha=0.05\))?
- Summarize the results of 4 estimated models in a single table using
stargazer()
command. (requires installation ofstargazer
).
install.packages("stargazer")
library(stargazer)
stargazer(model_1,model_2,model_3,model_4,type="text",
intercept.top=TRUE,intercept.bottom=FALSE,df=FALSE,digits=4)