2.2 Functional form selection

  • So far we have learned that the model can be expressed as a single equation or a system of equations including two variables only (bivariate model) or more than two variables (multivariate model), all variables don’t have to be numerical, and they can have different roles.

  • Econometric model is misspecified if we have chosen the incorrect functional form \(f(x)\) or maybe wrong variables. In this case, the estimated parameters may not be statistically significant!

The functional form \(f(x)\) can be linear or nonlinear. However, it is required that model is linear in the parameters, regardless of the variables (variables are allowed to be linear or nonlinear).

  • We should always transform nonlinear models into linear ones if possible!

  • Following bivariate models are linear in the parameters but nonlinear in variables \[\begin{equation} \begin{aligned} y_i=&\beta_0+\beta_1x_i^{2}+u_i \\ y_i=&\beta_0+\beta_1log~x_i+u_i \\ y_i=&\beta_0+\beta_1cos~x_i+u_i \end{aligned} \tag{2.2} \end{equation}\]

Exercise 5. Which of the following models can be transformed into a linear model in parameters and how? \[a)~~~y_i=e^{\beta_0+\beta_1x_i+u_i}~~~~~~~~~~b)~~~y_i=\beta_0+\beta_1^{3}x_i+u_i~~~~~~~~~~c)~~~y_i=\frac{1}{\beta_0+\beta_1x_i+u_i}\]
Solution Model (a) and (c) can be transform into a linear models in parameters, while model (b) can not. Model (a) requires taking the logs and model (c) requires inverse transformation of dependent variable: \[a)~~~log(y_i)=\beta_0+\beta_1x_i+u_i~~~~~~~~~~c)~~~\frac{1}{y_i}=\beta_0+\beta_1x_i+u_i\]
Exercise 6. Which model is linear in the parameters and which one is linear in variables? \[~~~~a)~~~~~y_i=\beta_0+\beta_1\frac{1}{x_i}+u_i~~~~~~~~~~b)~~~y_i=\beta_0+log\beta_1x_i+u_i\] \[c)~~~~y_i=\beta_0+\beta_1\sqrt{x_i}+u_i~~~~~~~~~~d)~~~y_i=\beta_0+\beta_1x_i+u_i\]
Solution Models (a), (c) and (d) are linear in parameters. Models (b) and (d) are linear in variables. Thus, model (d) is linear in parameters and in variables (so called lin-lin model). Specificaly, model (b) can not be transformed into a linear model in parameters, although it is linear in variables!
Exercise 7. Transform Cobb-Douglas production function \[y_i=\beta_0x_i^{\beta_1}z_i^{\beta_2}e^{u_i}\] into linear model. Explain the parameters \(\beta_1\) and \(\beta_2\) if \(y=\) production (\(000\) tons), \(x_1=\) number of employees and \(x_2=\) capital (millions USD).
Solution Log transformation is required: \[log(y_i)=log(\beta_0)+\beta_1 log(x_i)+\beta_2 log(z_i)+u_i\] Parameter \(\beta_1\) provides the \(\%\) change of production with respect to \(1\%\) change of employees, assuming constant capital. Parameter \(\beta_2\) provides the \(\%\) change of production with respect to \(1\%\) change of capital, assuming constant number of employees. Since all observed variables are transformed into logs, all changes are interpreted as percentages, free from any measurement units (elasticity).
  • Due to variables transformation you should be careful with slope coefficient interpretation!
TABLE 2.3: Models, equations and slope coefficients
\(~~\)Model \(~~~\)Equation Slope coefficient
lin-lin \(y_i = \beta_0 + \beta_1 x_i + u_i\) \(\beta_1 = \Delta y / \Delta x\)
log-log \(\log (y_i) = \beta_0 + \beta_1 \log (x_i) + u_i\) \(\beta_1 = \%\Delta y / \%\Delta x\)
log-lin \(\log (y_i) = \beta_0 + \beta_1 x_i + u_i\) \(\beta_1 \times 100 \approx \%\Delta y / \Delta x\)
lin-log \(y_i = \beta_0 + \beta_1 \log (x_i) + u_i\) \(\beta_1 / 100 \approx \Delta y / \%\Delta x\)
polynomial\(~~~~\) \(y_i = \beta_0 + \beta_1 x_i + \beta_2 x_i^{2} + u_i~~~~~\) \(~~\beta_1 + 2\beta_2 x_0 \approx \Delta y / \Delta x_0\)
Exercise 8. Is the second-order polynomial linear in the parameters? From which model can you extract elasticity and semi-elasticity? Compute the exact (not approximate) percentage change in the variable \(y\) from the log-lin model with respect to a \(1\) unit change in the variable \(x\).
Solution The second-order polynomial is linear in the parameters although it includes quadratic term \(x^2\). Elasticity coefficient is \(\beta_1\) from log-log model, while semi-elasticity is \(\beta_1\) from log-lin model. The exact \(\%\) change in the variable \(y\) with respect to a \(1\) unit change in the variable \(x\) from log-lin model is computed as \((e^{\beta_1}-1)100\%\).
Exercise 9. Explain the meaning of the slope coefficient in each model if \(y=\) monthly gas consumption (in liters) and \(x=\) gas price per liter (in USD). \[ a)~~~~~~~~y_i=158.49-13.48x_i+u_i~\] \[ b)~log(y_i)=5.44-0.87log(x_i)+u_i\] \[ c)~log(y_i)=3.27-0.14x_i+u_i~~~~~~~~\]
Solution Case a) if the gas price increases by \(1\) USD, the gas consumption is expected to decrease by \(13.48\) liters per month. Case b) if the gas price increases by \(1%\), the monthly gas consumption is expected to decrease by \(0.87%\). Case c) if the gas price increases by \(1\) USD, the monthly gas consumption is expected to decrease by approximately \(14%\).
  • Regardless of the functional form you should always keep in mind parsimony principle with respect to the number of variables on the right-hand side (less is better)!

  • This principle balances model goodness-of-fit with it’s simplicity to avoid overfitting