7 Utilizing dummy variables

A single dummy variable on th RHS was introduced in the simple example (2.1)
If qualitative variable has two categories (levels) only one dummy variable should be included in the model with constant term

-> 0 indicates the absence of the category

-> 1 indicates the presence of the category
If qualitative variable has more than two categories (\(m>2\)) the number of dummy variables should be for one less then the number of categories (\(m-1\))
Otherwise the system of normal equations obtained by the least squares method (5.14) has no solution (due to liner dependence of the RHS variables, so called perfect multicollinearity problem)
When dependent variable \(y\) is dummy then linear probability models \(LPM\) should be applied (these models will not be considered in this course)
In econometric analysis dummy variables on the RHS are most commonly used to describe:

Changes in the constant term only
Changes in the slope coefficient only
Changes in the constant term and the slope coefficient
Interaction effect
Seasonal component of a time-series (seasonal dummy variables)

These changes in parameters, e.g. the slope coefficient is not same for all observations, may be helpful in capturing nonlinear dependence between variables.
Nonlinear dependence usually indicates that parameters are not constant, and inclusion of dummy variables makes it easier to interpret nonlinear dependence with respect to characteristic groups of observations.

Dependence of income (variable \(y\) in USD) on gender (variable \(d\)) and working experince (variable \(x\) in years) is analyzed. Explain the differences between given models! \[a)~~~\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1d_i~~~~~~~~~~~~~~~~~~~~~~~~~b)~~\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1d_i+\hat{\beta}_2x_i~~~~~~~~~~~~~~~~~~\] \[c)~~\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1(d_ix_i)+\hat{\beta}_2x_i~~~~~~~d)~~\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1d_i+\hat{\beta}_2x_i+\hat{\beta}_3(d_ix_i)\]

Dependence of income (variable \(y\) in USD) on gender (variable \(d_1\)) and working sector (variable \(d_2\)) is analyzed, considering \[d_1=\left\{\begin{array}{cl} 1,& if~male\\0,& if~female\end{array}\right.~~,~~~~~~~~~d_2=\left\{\begin{array}{cl} 1,& if~working~in~private~sector\\0,& if~working~in~public~sector\end{array}\right.\]

What changes can be explained by given models?

\[a)~~~\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1d_{i,1}+\hat{\beta}_2d_{i,2}+\hat{\beta}_3(d_{i,1}d_{i,2})~~~~~~~~~~~~~~~~~~~\] \[b)~~\hat{y}_i=\hat{\beta}_0+\hat{\beta}_1x_i+\hat{\beta}_2(x_id_{i,1})+\hat{\beta_3}(x_id_{i,2})+\hat{\beta}_4(d_{i,1}d_{i,2})\]

Import the file wages.txt to R Studio (import the text file directly to R Studio using link address or download it from the course site and load it from your working directory). Save a data frame as an object wagedata. Given data frame presents for 394 workers (\(n=394\)) with respect to: wage (monthly wage in USD), gender (1 if male and 0 if female), experience (1 if years of experience are at least 20 and 0 if years of experience are less than 20) and iq score.

Estimate a model to examine . Save estimated model as a new object model_1.

wagedata=read.table(file="http://www.efzg.hr/
userdocsimages/sta/jarneric/wages.txt",header=TRUE)
head(wagedata)
model_1=lm(iq~gender,wagedata)
summary(model_1)

Continued …

What is the expected iq score for female workers and male workers? Obtain the results from model_1 object.
Is the difference in expected (average) iq score between males and females (significance level \(\alpha=0.05\))?

Using the same data as in exercise 7.3 estimate a model to . Save estimated model as a new object model_2.

model_2=lm(wage~gender,wagedata)
summary(model_2)

Is the difference in average (expected) monthly wage between males and females (\(\alpha=0.05\))? Use a (requires installation of car package).

Continued …

install.packages("car")
library(car)
linearHypothesis(model_2,c("gender=500"),test="Chisq")

Estimate a model to . Save estimated model as a new object model_3. Use a .

model_3=lm(wage~gender+experience+gender:experience,wagedata)
summary(model_3)
linearHypothesis(model_3,c("experience=gender"),test="Chisq")

Is between two qualitative variables statistically significant (significance level \(\alpha=0.05\))?

Using the same data as in previous exercises estimate a model to allow for possibility that the . Save estimated model as a new object model_4.

model_4=lm(wage~iq+iq:gender,wagedata)
summary(model_4)

Is the difference in slope coefficients statistically significant (\(\alpha=0.05\))?
Summarize the results of 4 estimated models in a single table using stargazer() command. (requires installation of stargazer).

install.packages("stargazer")
library(stargazer)
stargazer(model_1,model_2,model_3,model_4,type="text",
intercept.top=TRUE,intercept.bottom=FALSE,df=FALSE,digits=4)