6.3 Diagnostick checking

Diagnostic tests should be applied to check if all assumptions, i.e. Gauss-Markov conditions are satisfied
If some of the assumptions are NOT satisfied then OLS estimator is no longer BLUE: biased coefficients $\hat\beta_j$ (perhaps with incorrect sign), wrong standard errors $se(\beta_j)$ which are usually overestimated (inflated), no longer valid significance tests if normality assumption does not hold, etc.
How can violations of the assumptions be detected?
What are the most likely causes of the violations in practice?
Can data be transformed in a certain way such that assumptions are met?
Which alternative estimation method, that is still valid, can be used?

1. Linearity in the parameters

If we have assumed linear model which does not fit data well, then a proposed model is misspecified
The misspecification problem occurs when wrong functional form is chosen, or if some irrelevant variables are included in the model or some important variables are omitted.
In that case OLS estimator is biased and not consistent
Some nonlinear models can still be estimated using OLS method if they can be transformed into linear models. Otherwise, OLS method can not be applied and the “true” nonlinear models should be estimated using NLS method (Nonlinear Least Squares) which requires an iterative optimization algorithm from the class of Quasi-Newton numerical approximations.

If non-linearity is the source of misspecification problem, it implies that estimated coefficients might change between groups of observations, i.e. the effect of $x$ on $y$ is not the same across all observations, and thus dummy variables can be introduced to capture these changes (see, e.g. Exercise 4).

2. Fixed values of RHS variables in repeated sampling

This assumption is automatically satisfied when dealing with cross-sectional data
If the sample data presents time-series then exogeneity assumption should be checked (RHS variables and error terms should be independent). This assumption will be discussed in the next sections that cover time-series analysis.

3. Error terms have constant variance

The term homoscedasticity denotes a constant variance of the error terms $\sigma^2_u$
If the variance of error terms is not the same for each observation then we have a problem called heteroscedasticity
Heteroscedasticity means that diagonal elements of the covariance matrix $\Omega$ are different!
OLS estimates are no longer efficient but still unbiased in the presence of heteroscedasticity
Inefficient OLS estimates will have large standard errors $se(\hat\beta_j)$ and consequently small $t-$ statistics $t_j$ , and thus, we often mistakenly conclude that some variables are “not significant” because of very high $p-$ values.

Typically, two approaches can be used to address heteroscedasticity problem:

a) robust standard errors which are not “sensitive” to heteroscedasticity because the matrix $\hat\Gamma$ is adjusted (corrected) in a certain way

a) Weighted Least Squares WLS

It is most commonly used White’s robust covariance matrix $\tilde\Gamma$ from which White’s robust standard errors are obtained
White’s robust covaraince matrix of the estimators $\hat\beta$ is $\begin{equation}\tilde\Gamma=(x^{T}x)^{-1}x^{T}\hat\Sigma~x(x^{T}x)^{-1} \tag{6.17} \end{equation}$
The matrix $\hat\Sigma$ is a diagonal matrix of squared residuals $\begin{equation}\hat\Sigma=\begin{bmatrix} \hat{u}^{2}_1 & 0 & 0 & \dots & 0 \\ 0 & \hat{u}^{2}_2 & 0 & \dots & 0 \\ 0 & 0 & \hat{u}^{2}_3 & \dots & 0 \\ \vdots \\ 0 & 0 & 0 & \dots & \hat{u}^{2}_n \end{bmatrix} \tag{6.18} \end{equation}$
WLS method can be applied when the source of heteroscedasticity problem is known. For example, if the error terms variance increases proportionally with the RHS variable $x$ then the inverse values of variable $x$ can be taken as weights $w_{ii}$ $\begin{equation}\hat{\beta}_{WLS}=(x^{T}Wx)^{-1}x^{T}Wy \tag{6.19} \end{equation}$

$\begin{equation}W=\begin{bmatrix} \frac{1}{x_1} & 0 & 0 & \dots & 0 \\ 0 & \frac{1}{x_2} & 0 & \dots & 0 \\ 0 & 0 & \frac{1}{x_3} & \dots & 0 \\ \vdots \\ 0 & 0 & 0 & \dots & \frac{1}{x_n} \end{bmatrix}~~;~~~w_{ii}=\frac{1}{x_i} \tag{6.20} \end{equation}$

The source of heteroscedasticity is usually unknown, so the WLS method can not be applied as described above.
Still, the functional form of heteroscedasticity can be estimated -> squared residuals, obtained in the first step, are regressed on RHS variables using OLS method in the second step and then inverse values of estimated squared residuals $\hat u_i^2$ are taken as weights $w_{ii}$ within WLS in the third step.
Thus smaller weights are associated to observations with a higher error variance
Additionally, there may be a problem that off-diagonal elements of the matrix $\Omega$ are not zero! In that case, a more general weighting matrix is applied and the method is called Generalized Least Squares GLS (WLS is a special case of GLS).
To detect heteroscedasticity Breusch-Pagan test is commonly applied
BP uses the coefficient of determination from the test equation $\tilde{R}^2$ (squared residuals from initial model are regressed on all RHS variables). Test equation of BP test is $\begin{equation} \hat{u}_i^2=\alpha_0+\alpha_1x_{i,1}+\alpha_2x_{i,2}+...+\alpha_k x_{i,k}+e_i \tag{6.21} \end{equation}$
The null hypothesis of BP test assumes that error terms are homoscedastic, i.e. all coefficients (except $\alpha_0$ ) from test equation (6.21) are zero

$\begin{equation}\begin{matrix} H_0:~\alpha_1=\alpha_2=...=\alpha_k=0 \\ BP=n\tilde{R}^2\sim\chi^2_{(df=k)}\end{matrix} \tag{6.22} \end{equation}$

Null hypothesis of BP test will be rejected if $p-$ value from $\chi^2$ distribution with $k$ degrees of freedom is less than the significance level $\alpha$ ( $1\%$ , $5\%$ or $10\%$ )
If the null hypothesis is rejected then the problem of heteroscedasticity exist!
Test equation of BP test (6.21) can be extended for squared terms and interaction terms. This kind of test is called White test
Heteroscedasticity can be avoided if the variables are previously transformed into logs. Logarithmic transformations (whenever possible) stabilize the variance of the error terms, especially in time-series data.

4. Independency of the error terms

If the error terms are not independent then we have problem called autocorrelation
Autocorrelation means that matrix $\Omega$ is not a diagonal matrix!
Off-diagonal elements are (auto)covariances. All (auto)covariances should be zero if the error terms are independent

$\begin{equation}Cov(u_i,u_j)=0~~~\forall~i\ne j \tag{6.23} \end{equation}$

OLS estimates are no longer efficient and standard errors are often smaller then they should be
Coefficient of autocorrelation is estimated by standardizing residuals autocovariance
Autocovariance is the measure of linear dependence between shifted residuals for a given step $j$ $\begin{equation}\hat\rho_j=\frac{Cov(\hat{u}_i,\hat{u}_j)}{Var(\hat{u}_i)}=\frac{\displaystyle\sum_{i=1+j}^n \hat{u}_i \hat{u}_{i-j}}{\displaystyle\sum_{i=1}^n \hat{u}_i^2}~;~~~-1\leq \hat\rho_j\leq 1 \tag{6.24} \end{equation}$
Autocorrelation coefficient of zero order $j=0$ is exactly $1$ . For every step $j>1$ autocorrelation is a decreasing function.
All autocorrelation coefficients $\hat\rho_j~,~j=1,~2,...,p$ should be zero if the assumption of error terms independency holds
It is common to test the significance of autocorrelation at the first few steps $(j=1,~2,~3)$
In practice Breusch-Godfrey test is commonly used to test the significance of autocorrelation up to and including step $p$ . Test equation of a BG test is $\begin{equation}\hat{u}_i=\beta_0+\beta_1x_{i,1}+\beta_2x_{i,2}+...+\beta_kx_{i,k}+\phi_1\hat{u}_{i-1}+...+\phi_p\hat{u}_{i-p} \tag{6.25} \end{equation}$
Null hypothesis of a BG test and test statistic are given as $\begin{equation}\begin{matrix} H_0:~\phi_1=\phi_2=...=\phi_p=0 \\ BG=n\tilde{R}^2\sim\chi^2_{(df=p)}\end{matrix} \tag{6.26} \end{equation}$
The problem of autocorrelation up to and including step (lag) $p$ exist if the null hypothesis of a BG test is rejected
Null hypothesis of a BG test will be rejected if $p-$ value from $\chi^2$ distribution with $p$ degrees of freedom is less than the significance level $\alpha$ ( $1\%$ , $5\%$ or $10\%$ )

The problem of autocorrelation means that the some of dynamic properties of the data are not captured by the model when dealing with time-series (model should be adjusted by including lagged values of dependent variable on the RHS). When dealing with cross-sectional data autocorrelation problem can be solved by using Newey-West standard errors HAC (Heteroscedasticity and Autocorrelation Consistent).

5. Error terms are normally distributed with zero mean

Under normality assumption OLS estimates are consistent and efficient
If error terms aro not normally distributed significance tests are no longer valid ( $t-$ statistic, $F-$ statistics or $\chi^2-$ statistic) because inappropriate distribution was assumed
If the null hypothesis of error terms normality is true $\begin{equation}H_0:~u_i\sim N(0,~\sigma_u^2)~~~\forall~i \tag{6.27} \end{equation}$ then residuals should have a skewness close to zero ( $\alpha_3\approx 0$ ) and a kurtosis close to three ( $\alpha_4\approx 3$ ).
These two parameters (skewness and kurtosis) can be used jointly to check for normality according to the Jarque-Bera test $\begin{equation}JB=n\times \bigg(\frac{\alpha_3^2}{6}+\frac{(\alpha_4-3)^2}{24}\bigg)\sim\chi^2_{(df=2)} \tag{6.28} \end{equation}$
The null hypothesis of a JB test will be rejected if $p$ -value of $\chi^2$ distribution with $2$ degrees of freedom is less than the significance level $\alpha$ ( $1\%$ , $5\%$ or $10\%$ ).
Histogram can be used for graphical evidence of (non)normality (e.g. see subsection 3.2)

Logarithmic transformation may reduce skewness (asymmetry of distribution)!

6. Independency of the RHS variables

If the matrix $x$ has no full rank then the columns of matrix $x$ are linearly dependent and the system of normal equations obtained by OLS has no solution (problem of perfect multicollinearity)
Multicollinearity means that RHS variables ( $x_1,~x_2,...,x_k$ ) are highly correlated and in practical applications it often appears as a problem of not perfect but a serious multicollinearity
Consequences of the serious multicollinearity are:

matrix $x$ is nearly singular, i.e. it’s determinant is approximately zero
overestimated standard errors $se(\hat\beta_j)$ , i.e. variances of the estimates are higher than they should be
$t-$ statistics are low (the significance tests can be improved by getting more data if possible)
perhaps the wrong signs of estimated coefficients
assumption ceteris paribus does not hold

If the RHS variables are time-series then the problem of multicollinearity may exist if variables have similar or common trend (e.g. they have a common tendency of growing over time)

The easiest way to identify multicollinearity is to examine the correlation matrix of RHS variables. Correlation matrix of the RHS variables should be identity matrix if there is no correlation between them. Variance inflation factor VIF is yet another indicator that can be used for detecting multicollinearity, i.e. serious problem of multicollinearity exists if $VIF>5$ .

$\begin{equation}VIF_j=\frac{1}{1-R^2_j}~~;~~~VIF_j>5~;~~~R^2_j>0.8 \tag{6.29} \end{equation}$

where $R^2_j$ denotes the coefficient of determination of the test equation in which the $j-$ th RHS variable is regressed on remaining $(k-j)$ RHS variables.

Solution of the problem:

Omitting some RHS variables (if we exclude relevant variables this may cause omitted variable bias problem, see e.g. Exercise 26)
If the variables are time-series this problem can be solved by removing the trend from the data by using first differences