5.3 OLS assumptions

  • In total, 7 assumptions may emerge:
  1. Linearity in the parameters
  2. RHS variables are fixed (matrix \(x\) has the same values in repeated sampling)
  3. Constant variance of error terms (error term variances are equal across all observations, so called homoskedasticity)
  4. Independence of error terms (no correlation between error terms, i.e. no autocorrelation)
  5. Normality of error terms (error terms are normally distributed with zero mean)
  6. Independence of RHS variables (matrix \(x\) has a full rank)
  7. Independence between RHS variables and error terms, i.e. when RHS variables are not fixed, but random (dealing with time-series data), additional assumption of exogeneity is required
  • Not all assumptions are required, depending on the data type and model specification

  • When dealing with cross-sectional data, assumption (2) is fulfilled by default and thus assumption (7) is not required

  • When dealing with time-series data, assumption (2) is not fulfilled and thus assumption (7) is required

  • In case of bivariate econometric model assumption (6) is not required because there is just one variable on the right hand side

  • Opposite to that, multivariate econometric model requires assumption (6) to be fulfilled

  • Assumptions (3) and (4) can be embedded into the covariance matrix of error terms \(~\Omega\)

\[\begin{equation} Var(u|x)=E(uu^{T})=\Omega=\begin{bmatrix} \sigma_u^{2} & 0 & 0 & \cdots & 0 \\ 0 & \sigma_u^{2} & 0 & \cdots & 0 \\ 0 & 0 & \sigma_u^{2} & \cdots & 0 \\ \vdots \\ 0 & 0 & 0 & \cdots & \sigma_u^{2} \end{bmatrix} \tag{5.17} \end{equation}\]

  • Moreover, assumptions (3), (4) and (5) can be summarized in the following notation

\[\begin{equation}u|x\sim N~(0,~\Omega)~;~~\Omega=\sigma_u^{2}I \tag{5.18} \end{equation}\]

  • Variance of the error terms \(\sigma_u^{2}\) is unknown and it can be estimated using squared residuals. Unbiased estimate of the error terms variance is given by

\[\begin{equation}\hat{\sigma}_u^{2}=\frac{\hat{u}^{T}\hat{u}}{n-(k+1)}=\frac{\displaystyle\sum_{i=1}^n \hat{u}^2_i}{n-k-1} \tag{5.19} \end{equation}\]

  • The square root of (5.19) is called regression standard error

\[\begin{equation}\hat{\sigma}_u=\sqrt{\frac{\displaystyle\sum_{i=1}^n \hat{u}^2_i}{n-k-1}} \tag{5.20} \end{equation}\]

  • Unbiased estimator \(\hat{\sigma}_u^{2}\) is a random variable, independently distributed from an estimator \(\hat{\beta}\), and thus \[\begin{equation}\frac{\hat{\sigma}_u^{2}(n-k-1)}{\sigma_u^{2}}\sim~\chi^2_{(df=n-k-1)} \tag{5.21} \end{equation}\]

  • The fraction of \(\chi^2\) variable and degrees of freedom \(df\) equals to \[\begin{equation} \frac{\chi^2}{df}=\frac{\frac{\hat{\sigma}_u^{2}(n-k-1)}{\sigma_u^{2}}}{n-k-1}=\frac{\hat{\sigma}_u^{2}}{\sigma_u^{2}} \tag{5.22} \end{equation}\]

  • Equality in (5.22) is important for determining test statistics in significance testing of estimated parameters (section 6.1)

Exercise 24. Answer the following questions:

  1. What are the assumptions of a bivariate econometric model based on time-series data?
    Solution A bivariate model has only one RHS variable, and thus, assumption (6) is not an issue, while assumption (7) should be checked as assumption (2) is not fulfilled by default when dealing with time-series data. Therefore, assumptions (1), (3), (4), (5) and (7) are required.
  2. What are the assumptions of a multivariate econometric model based on cross-sectional data?
    Solution A multivariate model has more than one RHS variable, and thus, assumption (6) should be checked, while assumption (7) is not an issue as assumption (2) is fulfilled by default when dealing with cross-sectional data. Therefore, assumptions (1), (2), (3), (4), (5) and (6) are required.
  3. What are the assumptions of a multivariate econometric model based on time-series data?
    Solution All assumptions should be checked, except for assumption (2), which is not typically fulfilled by default.
  4. What are dimensions of the matrix \(x\)?
    Solution The matrix \(x\) has \(n\) rows corresponding to the number of observations (sample size), and \(k+1\) columns (with one additional column for the intercept and \(k\) independent variables).
  5. What is the formula for OLS estimator?
    Solution OLS estimator is given by the formula \(\widehat{\beta}_{OLS} = (x^{T}x)^{-1}x^{T}y\)
  6. Which distribution have error terms?
    Solution It is typically assumed that error terms follow a normal distribution with a zero mean and constant variance. However, error terms can also follow other statistical distributions, such as the t-distribution or Poisson distribution, depending on the data type and model specification.
  7. Which elements has matrix \(\Omega\)?
    Solution The diagonal elements of the matrix \(\Omega\) are variances, while the off-diagonal elements are covariances. The matrix \(\Omega\) is always symmetric, but not necessarily diagonal (symmetric matrix is diagonal if all off-diagonal elements are zero).

Exercise 25. Consider multivariate time-series econometric model:

\[y_t=\beta_0+\beta_1x_t+\beta_2z_t+u_t~;~~~u_t\sim N(0,~\Omega)\]

  1. Which assumption does not hold if \(Cov(x_t,z_t)\ne0\)?
    Solution The observed RHS variables \(x_t\) and \(z_t\) are not independent (as implied by the non-zero covariance), but they should be according to assumption (6). Therefore, the independency of RHS variables does not hold.
  2. Is endogeneity problem present if \(Cov(x_t,u_t)=0\) and \(Cov(z_t,u_t)=0\)?
    Solution Both observed RHS variables \(x_t\) and \(z_t\) are independent of the error terms \(u_t\) (as implied by zero covariances), and thus, the exogeneity assumption (7) holds, meaning that the endogeneity problem is not present.
  3. What kind of problem exist if \(Cov(u_t,u_{t-1})\ne0\)?
    Solution If covariance between error terms (shifted/lagged by \(1\) or more steps) is not zero, then the autocorrelation problem exists, i.e. the assumption (4) does not hold.
  4. Which assumption does not hold if matrix \(\Omega\) is diagonal but has no equal diagonal elements?
    Solution If the diagonal elements of matrix \(\Omega\) are not equal, meaning that the variance of error terms is not constant across observations, the homoskedasticity assumption (4) does not hold.