2.1 Variable selection
- Assuming unidirectional causality, where x causes y, there are several alternative terms that can be used for the variables y and x
Variable y | Variable x |
---|---|
dependent | independent |
outcome | predictor |
response | explanatory |
regressand | regressor |
endogenous | exogenous |
Variables on the right-hand side can have different roles; some may serve as control variables, while others can be mutiplied to represent interaction term
When dealing with time-series data (data observed over time) it is common for the dependent variable y to appear also as an independent variable, making it endogenous
Exercise 1. Which variable is endogenous and which one is exogenous in the following equation? What does subscript t represent? Which variable is lagged? yt=β0+β1yt−1+β2xt+ut t=1, 2,...,T
Solution
Variable x is exogenous because x causes y, but not the other way around. Variable y is endogenous as it appears on both sides of the equation, meaning that y is both the consequence and the cause simultaneously. This is common when dealing with time-series data, and thus subscript t represents time unit (such as week, day, hour, month or year). Variable yt−1 is agged because it is observed in previous time period (subscript t−1), e.g. lagged consumption might be used as RHS variable to account for how past consumption impacts present consumption. Likewise, a variable lagged for two periods is noted as yt−2, variable lagged for three periods is noted as yt−3, etc.Exercise 2. Which variable is endogenous and which one is exogenous in the system of equations? How many parameters we need to estimate? System of two equations write in a matrix form!
yt=β1,0+β1,1yt−1+β1,2xt−1+u1,t xt=β2,0+β2,1yt−1+β2,2xt−1+u2,tSolution
Considering the system of equations both variables are endogenous, meaning that x causes y and y causes x. From this point none of the variables is strictly exogenous. Matrix form of the system is: [ytxt]=[β1,0β2,0]+[β1,1β1,2β2,1β2,2][yt−1xt−1]+[u1,tu2,t]
Exercise 3. Is the following model bivariate or multivariate? What does subscript i represent? Which terms are variables and which are parameters? Which variables are known (observed) and which are unknown? yi=α+βxi+γzi+ui i=1, 2,...,n
Solution
It is multivariate model due to more than one observed RHS variable. Subscript i represent cross-sectional unit. Variables are y, x, z and u, while α, β and γ are parameters. Known variables are y, x and z. Error term is uknown (unobserved) variable u.
Exercise 4. Which parameter represents the constant term, and which one represents the interaction term? Explain the interaction term, assuming that y= income, x= years of working experience and z= gender (1 for males and 0 for females). yi=α+βxi+γzi+λ(xizi)+ui