2.2 Functional form selection

  • So far we have learned that the model can be expressed as a single equation or a system of equations including two variables only (bivariate model) or more than two variables (multivariate model), all variables don’t have to be numerical, and they can have different roles.

  • Econometric model is misspecified if we have chosen the incorrect functional form \(f(x)\) or maybe wrong variables. In this case, the estimated parameters may not be statistically significant!

The functional form \(f(x)\) can be linear or nonlinear. However, it is required that model is linear in the parameters, regardless of the variables (variables are allowed to be linear or nonlinear).

  • We should always transform nonlinear models into linear ones if possible!

  • Following bivariate models are linear in the parameters but nonlinear in variables \[\begin{equation} \begin{aligned} y_i=&\beta_0+\beta_1x_i^{2}+u_i \\ y_i=&\beta_0+\beta_1log~x_i+u_i \\ y_i=&\beta_0+\beta_1cos~x_i+u_i \end{aligned} \tag{2.2} \end{equation}\]

Exercise 5. Which of the following models can be transformed into a linear model in parameters and how? \[a)~~~y_i=e^{\beta_0+\beta_1x_i+u_i}~~~~~~~~~~b)~~~y_i=\beta_0+\beta_1^{3}x_i+u_i~~~~~~~~~~c)~~~y_i=\frac{1}{\beta_0+\beta_1x_i+u_i}\]

Exercise 6. Which model is linear in the parameters and which one is linear in variables? \[~~~~a)~~~~~y_i=\beta_0+\beta_1\frac{1}{x_i}+u_i~~~~~~~~~~b)~~~y_i=\beta_0+log\beta_1x_i+u_i\] \[c)~~~~y_i=\beta_0+\beta_1\sqrt{x_i}+u_i~~~~~~~~~~d)~~~y_i=\beta_0+\beta_1x_i+u_i\]

Exercise 7. Transform Cobb-Douglas production function \[y_i=\beta_0x_i^{\beta_1}z_i^{\beta_2}e^{u_i}\] into linear model. Explain the parameters \(\beta_1\) and \(\beta_2\) if \(y=\) production (\(000\) tons), \(x_1=\) number of employees and \(x_2=\) capital (millions USD).

  • Due to variables transformation you should be careful with slope coefficient interpretation!
TABLE 2.3: Models, equations and slope coefficients
\(~~\)Model \(~~~\)Equation Slope coefficient
lin-lin \(y_i = \beta_0 + \beta_1 x_i + u_i\) \(\beta_1 = \Delta y / \Delta x\)
log-log \(\log y_i = \beta_0 + \beta_1 \log x_i + u_i\) \(\beta_1 = \%\Delta y / \%\Delta x\)
log-lin \(\log y_i = \beta_0 + \beta_1 x_i + u_i\) \(\beta_1 \times 100 \approx \%\Delta y / \Delta x\)
lin-log \(y_i = \beta_0 + \beta_1 \log x_i + u_i\) \(\beta_1 / 100 \approx \Delta y / \%\Delta x\)
polynomial\(~~~~\) \(y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^{2} + u_i~~~~~\) \(~~\beta_1 + 2\beta_2 x_0 \approx \Delta y / \Delta x_0\)

Exercise 8. Is the second-order polynomial linear in the parameters? From which model can you extract elasticity and semi-elasticity? Compute the exact (not approximate) percentage change in the variable \(y\) from the log-lin model with respect to a \(1\) unit change in the variable \(x\).

Exercise 9. Explain the meaning of the slope coefficient in each model if \(y=\) monthly gas consumption (in liters) and \(x=\) gas price per liter (in USD). \[a)~~y_i=158.49-13.48x_i+u_i~~~~~~~b)~~logy_i=5.44-0.87logx_i+u_i\] \[~~c)~~logy_i=3.27-0.14x_i+u_i~~~~~~~~~d)~~y_i=265.51-103.62logx_i+u_i\]

  • Regardless of the functional form you should always keep in mind parsimony principle with respect to the number of variables on the right-hand side (less is better)!

  • This principle balances model goodness-of-fit with it’s simplicity to avoid overfitting