6.3 Diagnostick checking
Diagnostic tests should be applied to check if all assumptions, i.e. Gauss-Markov conditions are satisfied
If some of the assumptions are NOT satisfied then OLS estimator is no longer BLUE: biased coefficients \(\hat\beta_j\) (perhaps with incorrect sign), wrong standard errors \(se(\beta_j)\) (usually overestimated), no longer valid significance tests if normality assumption does not hold, etc.
How can violations of the assumptions be detected?
What are the most likely causes of the violations in practice?
Can data be transformed in a certain way such that assumptions are satisfied?
Which alternative estimation method, that is still valid, can be used?
- If we have assumed linear functional form which does not fit well to the data then a proposed model is
- The misspecification problem occurs when . Also, the misspecification problem may occur if some or some important variables are omitted.
- Some nonlinear models can still be estimated using OLS method if they can be transformed into linear models. Otherwise, the (it requires an iterative optimization algorithm from the class of Quasi-Newton numerical approximations).
- This assumption is automatically satisfied when dealing with cross-sectional data
- If the sample data presents time-series then assumption is required (regressors and error terms should be independent, i.e. variable \(x\) does not give us any information about variable \(u\) -> \(Cov(x,u)=0\))
The term \(homoscedasticity\) denotes a constant variance of the error terms \(\sigma^2_u\). If the variance of error terms is not same for each observation then we have a problem called
Heteroscedasticity exist when diagonal elements of the covariance matrix \(\Omega\) are different!
Inefficient OLS estimators will have large standard errors \(se(\hat\beta_j)\) and consequently very small test statistics \(t_j\). Therefore, we often mistakenly conclude that variables are “not significant” because of very high \(p-values\).
There are two approaches to addressing this problem: (a) using or (b) using WLS
Robust standard errors are not “sensitive” to heteroscedasticity because the matrix \(\hat\Gamma\) is adjusted (corrected). It is most commonly used \(White's\) robust covariance matrix \(\tilde\Gamma\) from which \(White's\) standard errors are obtained.
\(White's\) robust covaraince matrix of the estimators \(\hat\beta_j\) is \[\begin{equation}\tilde\Gamma=(x^{T}x)^{-1}x^{T}\hat\Sigma~x(x^{T}x)^{-1}\end{equation}\]
The matrix \(\hat\Sigma\) is a diagonal matrix of squared residuals \[\begin{equation}\hat\Sigma=\begin{bmatrix} \hat{u}^{2}_1 & 0 & 0 & \hdots & 0 \\ 0 & \hat{u}^{2}_2 & 0 & \hdots & 0 \\ 0 & 0 & \hat{u}^{2}_3 & \hdots & 0 \\ \vdots \\ 0 & 0 & 0 & \hdots & \hat{u}^{2}_n \end{bmatrix}\end{equation}\]
WLS estimator can be applied when the . For example, if the error terms variance increases proportionally with the regressor \(x\) then inverse values of variable \(x\) are taken as weights \(w_{ii}\) \[\begin{equation}\hat{\beta}_{WLS}=(x^{T}Wx)^{-1}x^{T}Wy\end{equation}\] \[\begin{equation}W=\begin{bmatrix} \frac{1}{x_1} & 0 & 0 & \hdots & 0 \\ 0 & \frac{1}{x_2} & 0 & \hdots & 0 \\ 0 & 0 & \frac{1}{x_3} & \hdots & 0 \\ \vdots \\ 0 & 0 & 0 & \hdots & \frac{1}{x_n} \end{bmatrix}~~;~~~w_{ii}=\frac{1}{x_i}~\end{equation}\]
The cause of heteroscedasticity is usually unknown, so the WLS method can not be used.
Still, -> squared residuals are mostly regressed on a set of independent variables using OLS method in the second step and then inverse values of estimated squared residuals \(\hat u_i^2\) are taken as weights \(w_{ii}\)
Thus
In addition there may be a problem that off-diagonal elements of the matrix \(\Omega\) are not zero!
In that case, a more general weighting matrix is applied and the method is called GLS
The commonly used is \(Breusch-Pagan\) test based on the coefficient of determination from auxiliary equation \(\tilde{R}^2\) (after estimation of the initial model in the first step, squared residuals are regressed on all independent variables in the second step). Auxiliary equation of BP test is \[\begin{equation} \hat{u}_i^2=\alpha_0+\alpha_1x_{i,1}+\alpha_2x_{i,2}+...+\alpha_k x_{i,k}+e_i\end{equation}\]
\(BP\) test is a type of \(Lagrange~multiplier\) test. By the null hypothesis it is assumed that error terms are homoscedastic, i.e. all coefficients from auxiliary equation (except a constant term) are zero. \[\begin{equation}\begin{matrix} H_0:~\alpha_1=\alpha_2=...=\alpha_k=0 \\ BP=n\tilde{R}^2\sim\chi^2_{(df=k)}\end{matrix}\end{equation}\]
Null hypothesis will be rejected if \(p-value\) from \(\chi^2\) distribution with \(k\) degrees of freedom is less than significance level \(\alpha\) (1%, 5% or 10%)
Auxiliary equation of \(BP\) test can be estimated using standardized or \(studentized\) residuals
Auxiliary equation of \(BP\) test can be extended for . This kind of test is called \(White\) test.
Heteroscedasticity can be mitigated if the variables are previously transforemd into logs. Logarithmic transformations (if possible) stabilize the variance of the error terms, especially in time-series data.
If the error terms are not independent then we have problem called
Autocorrelation means that matrix \(\Omega\) is not a diagonal matrix!
Off-diagonal elements are (auto)covariances.
All (auto)covariances should be zero if the error terms are independent \[\begin{equation}Cov(u_i,u_j)=0~~~\forall~i\ne j\end{equation}\]
Coefficient of autocorrelation is estimated for a given sample by standardizing
Autocovariance is the measure of linear dependence between shifted residuals for a given step \(j\) \[\begin{equation}\hat\rho_j=\frac{Cov(\hat{u}_i,\hat{u}_j)}{Var(\hat{u}_i)}=\frac{\sum_{i=1+j}^n \hat{u}_i \hat{u}_{i-j}}{\sum_{i=1}^n \hat{u}_i^2}~;~~~-1\leq \hat\rho_j\leq 1~\end{equation}\]
Autocorrelation coefficient of zero order \(j=0\) is exactly 1. For every step \(j>1\) autocorrelation is a decreasing function.
All autocorrelation coefficients \(\hat\rho_j~,~j=1,~2,...\) should be zero if assumption of independent error terms holds.
It is common to \((j=1,~2,~3)\)
\(Durbin-Watson\) test is used to check the null hypothesis that there is
One-sided alternative hypothesis implies the existence of significant positive (negative) autocorrelation \[\begin{equation}H_0:~\rho_1=0~;~~DW\approx2(1-\hat\rho_1)~;~~0\leq DW\leq 4\end{equation}\]
Null hypothesis will not be rejected if test statistic . Otherwise, it will be rejected (\(DW\) close to 0 indicates a positive autocorrelation while \(DW\) close to 4 indicates a negative autocorrelation)
\(Durbin-Watson\) is nowadays used because it can not be always applied (lagged values of dependent variable are not allowed as well as differenced values and requires that model has a constant term)
In practice \(Breusch-Godfrey\) test is often used to test the significance of up to and including step \(p\). Auxiliary equation of a BG test is \[\begin{equation}\hat{u}_i=\beta_0+\beta_1x_{i,1}+\beta_2x_{i,2}+...+\beta_kx_{i,k}+\phi_1\hat{u}_{i-1}+...+\phi_p\hat{u}_{i-p}\end{equation}\]
Null hypothesis of a BG test and test statistic are given as \[\begin{equation}\begin{matrix} H_0:~\phi_1=\phi_2=...=\phi_p=0 \\ BG=n\tilde{R}^2\sim\chi^2_{(df=p)}\end{matrix}\end{equation}\]
Null hypothesis of a BG test will be rejected if \(p-value\) from \(\chi^2\) distribution with \(p\) degrees of freedom is less than significance level \(\alpha\) (1%, 5% or 10%)
The problem of autocorrelation means that the some of dynamic properties of the data are not captured by the model when dealing with time-series data (model should be adjusted by including lagged values of dependent variable as additional regressors)
When dealing with cross-sectioanl data autocorrelation problem can be solved using \(Newey-West\) standard errors HAC -
- Under normality assumption OLS estimator is consistent and efficient
- If error terms aro not normally distributed (\(t-statistic\), \(F-statistics\) and \(\chi^2-statistic\)) because inappropriate distribution was assumed
- If the null hypothesis of error terms normality is true \[\begin{equation}H_0:~u_i\sim N(0,~\sigma_u^2)~~~\forall~i\end{equation}\] then residuals should have a skewness close to zero (\(\alpha_3\approx 0\)) and a kurtosis close to three (\(\alpha_4\approx 3\))
- Skewness and kurtosis can be used according to the \(Jarque-Bera\) test (both paramaters are considered)
\[\begin{equation}JB=n\times (\frac{\alpha_3^2}{6}+\frac{(\alpha_4-3)^2}{24})\sim\chi^2_{(df=2)}\end{equation}\] - The null hypothesis of a JB test will be rejected if \(p-value\) from \(\chi^2\) distribution with \(2\) degrees of freedom is less than significance level \(\alpha\) (1%, 5% or 10%) - Histogram can be used for graphical evidence of (non)normality - Logarithmic transformation may reduce skewness (asymetry of distribution) 6) - If matrix \(x\) has no full rank then the and the system of normal equations obtained by (problem of perfect multicollinearity)
Multicollinearity means that independent variables (\(x_1,~x_2,...,x_k\)) are highly correlated and in practical applications it often appears as a problem of not perfect but a
:
- matrix \(x\) is nearly singular, i.e. it’s determinant is approximately zero!
- overestimated standard errors \(se(\hat\beta_j)\). i.e. variances of the estimators are higher than they should be
- \(t-statistics~~t_j\) are low (the significance tests can be improved by getting more data if possible)
- perhaps the wrong signs of regression coefficients
- assumption \(ceteris~~paribus\) does not hold
- If the regressors (independent variables) are time-series then the problem of multicollinearity may exist if variables have similar or common trend (they have a common tendency of growing over time)
- The easiest way to identify multicollinearity is to examine the correlation matrix of regressors
- may also be used as indicator of multicollinearity
- Serious problem of multicollinearity exists if \(VIF>5\)
\[\begin{equation}VIF_j=\frac{1}{1-R^2_j}~~;~~~VIF_j>5~;~~~R^2_j>0,8\end{equation}\]
where \(R^2_j\) denotes the coefficient of determination of the auxiliary equation in which the \(j-th\) independent variable is regressed on remaining \((k-j)\) independent variables.
:
1) Omitting some independent variables (if we exclude relevant variables this may cause omitted variable bias problem)
2) If the variables are time-series this problem can be solved by removing trend component from the data using first differences
Two estimated models are given as: \[(1)~~\hat y_i=3,54+2,87x_i~~~~~~~~~~~~\] \[(2)~~\hat y_i=4,11+1,96x_i+0,53z_i\]
- Would be appropriate to use adjusted R-squared to decide which model fits better?
- Which problem appears in equation (1) when variable \(z\) is omitted although it is relevant?
- Which problem appears in equation (2) when variable \(z\) is not omitted although it is irrelevant?
- Why the slope coefficient with respect to variable \(x\) is changed in equation (2) after including variable \(z\)?
- In which case the slope coefficient with respect to variable \(x\) should not change?
- This assumption is required if regressors are not fixed (when dealing with time-series data)
- If variable \(x_t\) is not strictly exogenous, i.e. \(Cov(x_t,u_t)\ne0\) the exist.
- This problem is usually caused by omitting some relevant variables or by autocorrelation of the error terms
- It can be solved using . Method 2SLS requires application of which are correlated with regressors but uncorrelated with error terms and dependent variable.
- Endogenous regressors are regressed on instrumental variables using OLS method in the . In the dependent variable \(y_t\) is regressed on estimated regressors from the first step using same method.
Answer the following questions … Considering multivariate time-series econometric model \[y_t=\beta_0+\beta_1x_t+\beta_2z_t+u_t~~;~~u\sim N(0,~\Omega)\]
- Which assumption does not hold if \(Cov(x_t,z_t)\ne0\)?
- Does endogeneity problem exist if \(Cov(x_t,u_t)=0\) and \(Cov(z_t,u_t)=0\)?
- What kind of problem exist if \(Cov(u_t,u_{t-1})\ne0\)?
- Which assumption does not hold if matrix \(\Omega\) is diagonal but has no equal diagonal elements?
- Which estimation method you can use if matrix \(\Omega\) is not diagonal and has no equal diagonal elements?
- What can you conclude if the null hypothesis of a BP test is rejected?
- Compute \(White's\) robust covaraince matrix of the estimators \(\tilde\Gamma\)!
- What is WLS and when should be applied?