1.1 Basic econometrics reminder
An econometric model can represent as a single equation or a system of equations including either two variables (bivariate model) or more than two variables (multivariate model), and not all variables are required to be numerical, while they can have different roles
Basic steps of econometric analysis:
- Model specification (model should be correctly specified according to financial theory)
- Data collection and preparation (generate new variables or transform existing ones)
- Descriptive statistics of the sample data and examination of their properties
- Parameters estimation according to the chosen estimator, e.g. OLS, WLS, GLS, ML, GMM, etc.
- Significance testing of estimated parameters
- Diagnostic checking if all assumptions are met and how well model fits the data
- Interpretation and forecasting (to explain and predict changes of financial phenomena)
Model specification refers to: (1) appropriate variables selection, (2) assuming causality direction, and (3) appropriate functional form selection
Variables on the right-hand side can have different roles; some may serve as control variables, while others can be multiplied to represent interaction term (variables that moderate the relationship between y and x)
When dealing with time–series data (data observed over time) it is common for the dependent variable y to appear also as an independent variable, making it endogenous
Solution
Variable x is exogenous because x causes y, but not the other way around. Variable y is endogenous as it appears on both sides of the equation, meaning that y is both the consequence and the cause simultaneously. This is common when dealing with time–series data, and thus subscript t represents time unit (t=1, 2,…,T), such as week, day, hour, month, year, etc. Variable yt−1 is lagged because it is observed in previous time period (subscript t−1), e.g. lagged inflation might be used as RHS variable to account for how past inflation impacts present inflation. Likewise, a variable lagged for two periods is noted as yt−2, variable lagged for three periods is noted as yt−3, etc.Solution
It is multivariate model due to more than one observed RHS variable (k≥2). Variables are y, x, z and u, while α, β, γ and λ are parameters. Known (observed) variables are y, x and z. Error term u is unknown (unobserved) random variable.Parameter λ is the interaction term associated with the multiplication of the two variables x and z. In a given example, parameter λ represents the difference in the change of inflation with respect to 1 unit change in interest rate between two periods. For instance, if λ<0 it indicates that impact of interest rate on inflation was weaker in COVID pandemic period compared to non–pandemic period.
Example 3. Which variable is endogenous and which one is exogenous in the system of equations? How many parameters we need to estimate? System of two equations write in a matrix form!
yt=β1,0+β1,1yt−1+β1,2xt−1+u1,t xt=β2,0+β2,1yt−1+β2,2xt−1+u2,tSolution
Considering the system of equations both variables are endogenous, meaning that x causes y and y causes x. From this point none of the variables is strictly exogenous. Matrix form of the system is: [ytxt]=[β1,0β2,0]+[β1,1β1,2β2,1β2,2][yt−1xt−1]+[u1,tu2,t]- Keep in mind that in the pre–estimation phase raw data are typically transformed:
- Taking the logs, squares, inverse values, square roots,…
- Seasonally and/or calendar adjusted
- First differences are sometimes required as well as lagged values
- Deflating nominal values
- Most common data issues:
- Missing values (NA)
- Measurement errors (collected data may not always present the true values)
- Outliers (extreme values above or below the mean)
Regardless of the functional form and data type you should always consider parsimony principle with respect to the number of variables on the right–hand side (less is better)
This principle balances model goodness of fit with it’s simplicity to avoid overfitting