6.1 Significance testing

  • Keep in mind that estimator ˆβ is a random vector with (k+1) elements ˆβ0, ˆβ1, ˆβ2,,ˆβk

  • Significance testing of each parameter individually requires standardization of it’s estimator ˆβj (by standardizing an estimator we get a random variable z of standard normal distribution with zero mean and unit variance)

z=ˆβjβjse(ˆβj)=ˆβjβjσudiagj(xTx)1 N(0, 1)

  • Standard deviation of error terms σu is unknown, and expression (6.1) is preformulated: standardized variable z is divided with the square root of the fraction between χ2 variable and degrees of freedom df [see EQUATION (5.22)] ˆβjβjse(ˆβj)χ2df=ˆβjβjσudiagj(xTx)1ˆσ2uσ2u=ˆβjβjˆσudiagj(xTx)1 t(df=nk1)

  • Expression (6.2) defines the t-statistic which is used to test the significance of each parameter individually

Parameter βj is statistically significant if it’s estimated value is different from zero, i.e. whenever the null hypothesis is rejected  H0: βj=0

  • If the null hypothesis is true the t-statistic is a variable of Student’s t-distribution with degrees of freedom df=nk1 tj=ˆβjβjse(ˆβj)=ˆβj0se(ˆβj)=ˆβjse(ˆβj)

  • Opposite to the null hypothesis 3 types of test can be performed [TABLE 6.1]

TABLE 6.1: Three types of alternative hypothesis
     Alternative      P-value      Test type     
H_1: β_j ≠ 0 2P(t > |t_j|) two-sided
H_1: β_j < 0 P(t < t_j) lower-sided
H_1: β_j > 0 P(t > t_j) upper-sided

The null hypothesis ~H_0:~\beta_j=0~ will be rejected whenever p-value is less than significance level (1%, 5% or 10%)

  • For a given parameter we can assume not only value of zero but any real number -> H_0:~\beta_j=a

  • We can also assume some linear restrictions on more than one parameter or possibly all parameters, e.g. considering a multivariate model without restrictions (so called full model) y_i=\beta_0+\beta_1x_{i,1}+\beta_2x_{i,2}+\beta_3x_{i,3}+u_i we can test if variables x_1 and x_2 have the same effect on the variable y and if x_3 has no effect on y!

  • If the null hypothesis is true ~H_0:~\beta_1=\beta_2,~\beta_3=0~ the model with restrictions becomes y_i=\beta_0+\beta_1(x_{i,1}+x_{i,2})+u_i

  • After estimating both models using OLS method, residulas sums of squares RSS's are obtained and F-statistic is calculated \begin{equation}F^\prime=\frac{RSS_R-RSS_U}{RSS_U}\times \frac{n-k-1}{q}, \tag{6.4}\end{equation} where RSS_R is residual sum of squares from restricted model and RSS_U is residual sum of squares from unrestricted model, while q is the number of restrictions (in given example q=2).

  • Test statistic from F-distribution in (6.4) is defined with numerator degrees of freedom df_1=q and denominator degrees of freedom df_2=n-k-1

  • Number of restrictions q is equal to the parameters difference between unrestricted and restricted model

  • The null hypothesis can be written in matrix form \begin{equation}H_0:~R\beta=r, \tag{6.5}\end{equation} where R is restriction matrix, \beta is vector of parameters from unrestricted model and r is a vector of assumed values. In given example the null hypothesis is defined as

H_0:~\underbrace{\begin{bmatrix} ~0 & ~1 & -1 & ~0 \\ ~0 & ~0 & ~~0 & ~1 \end{bmatrix}}_{R} \underbrace{\begin{bmatrix}\beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3 \end{bmatrix}}_{\beta}=\underbrace{\begin{bmatrix}0 \\ 0 \end{bmatrix}}_{r}

  • Testing the null hypothesis that all independent variables are not statistically significant ~H_0:~\beta_1=\beta_2=\beta_3=0~ can also be written in matrix form H_0:~\underbrace{\begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}}_{R} \underbrace{\begin{bmatrix}\beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3 \end{bmatrix}}_{\beta}=\underbrace{\begin{bmatrix}0 \\ 0 \\ 0 \end{bmatrix}}_{r}

  • If the null hypothesis is true the restricted model includes constant term only y_i=\beta_0+u_i, where estimated value \hat{\beta}_0 presents a mean of dependent variable \bar{y}

  • The joint test of significance refers to hypothesis testing that all RHS variables have no significant effect on dependent variable, which is usually performed within analysis of variance

  • Analysis of variance as well as F-statistic and associated p-value are presented in ANOVA table

TABLE 6.2: ANOVA table
Source of variation ~~~Sum of squares~~~ df Test statistic P-value
ESS (explained by the model) y^T y - β^T X^T y k F^{\prime} =\frac{ESS/k}{(RSS/ (n - k - 1)} P(F>F^{\prime})
RSS (unexplained variations) β^T X^T y - (1/n) y^T J y n - k - 1
TSS (total variation of variable y) y^T y - (1/n) y^T J y n - 1
  • Fractions \frac{ESS}{k} and \frac{RSS}{n-k-1} are \chi^2 variables with degrees of freedom k and n-k-1, respectively.

Analysis of variance implies that TSS=ESS+RSS, while the fraction \frac{ESS}{TSS} provides information how well estimated model fits the data. Proportion of explained variations of dependent variable y is known as coefficient of determination \begin{equation} R^2=\frac{ESS}{TSS};~~~~~~0 \leq R^2 \leq 1 \tag{6.6} \end{equation}

  • If the null hypothesis is true in general ~H_0:~\beta_1=\beta_2=\dots=\beta_k=0~ coefficient of determination from restricted model R^2_R=0 and number of restrictions is equal to the number of RHS variables from unrestricted model (q=k), and thus \begin{equation} F^{\prime}=\frac{R^2_U-0}{1-R^2_U}\times\frac{n-k-1}{k}=\frac{R^2/k}{(1-R^2)/(n-k-1)} \tag{6.7} \end{equation}

Testing the hypothesis that none of the RHS variables significantly effect dependent variable y comes down to testing coefficient determination significance by F-statistic \begin{equation} H_0:~R^2=0;~~~~~~~~~F^{\prime}=\frac{R^2/k}{(1-R^2)/(n-k-1)} \tag{6.8} \end{equation}

  • Alternative tests related to the F-statistic are also used to test certain linear restrictions on parameters:

    1. Wald test – W
    2. Likelihood ratio test – LR
    3. Lagrange multiplier test – LM

~~~

  • Wald test W requires estimating only the unrestricted model using OLS method

\begin{equation} W=(R\hat{\beta}-r)^{T}(R\hat{\Gamma}R^{T})^{-1}(R\hat{\beta}-r)\sim\chi^2_{(df=q)} \tag{6.9} \end{equation}

  • Test statistic W in (6.9) is \chi^2 variable with q degrees of freedom

  • Likelihood ratio test LR requires estimating both models (unrestricted and restricted) using MLE method. The maximum value of the likelihood function is usually taken into logs and it is called log-Likelihood \begin{equation} LR=-2(logL_U-logL_R)=-2log\frac{L_U}{L_R}\sim\chi^2_{(df=q)} \tag{6.10} \end{equation} where L_U is a likelihood of unrestricted model and L_R is a likelihood of restricted model.

  • LR-statistic is asymptotically equivalent to W-statistic if the assumption of normality holds

  • Lagrange multiplier test LM requires estimating only the restricted model using OLS method in the first step. In the second step residuals from restricted model are regressed on all independent variables and coefficient of determination from test regression R^2_{test} is multiplied by the sample size n \begin{equation} LM=n R^2_{test} \sim\chi^2_{(df=q)} \tag{6.11} \end{equation}

  • LM-statistic is also asymptotically equivalent to W-statistic. All three test statistics are approaching to the same \chi^2 value, so it is appropriate to use these tests in a large samples.

\begin{equation}\begin{matrix}qF=W \\ LM\leq LR\leq W\end{matrix} \tag{6.12} \end{equation}

Exercise 27. Using lm() command in RStudio and sample data from newdata object (already loaded a text file eu_countries.txt) estimate a multivariate model: y_i=\beta_0+\beta_1x_{i,1}+\beta_2x_{i,2}+\beta_3x_{i,3}+u_i, where y=gdp, x_1=population, x_2=unemployment and x_3=education. Compute test statistics F, W, and LR to check if variables x_2 and x_3 are redundant (significance level \alpha=0.05). Present results of unrestricted and restricted model using modelsummary() command.
Solution Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them. Redundancy of variables x_2 and x_3 can be checked by testing the null hypothesis H_0:\beta_2=\beta_3=0 within F-statistic, W-statistic, or LR-statistic. All three types of tests reject the null hypothesis (p-value \lt \alpha), meaning that unemployment and education are not redundant variables.
# Unrestricted model and restricted model estimation
unrestricted=lm(gdp~population+unemployment+education,data=newdata)
restricted=lm(gdp~population,data=newdata)
# F-statistic within ANOVA table
anova(restricted,unrestricted)
# W-statistic requires "car" package to be installed and loaded from the library
install.packages("car")
library(car)
# Specifying the type of Wald test (F or Chi-square statistic)
linearHypothesis(unrestricted,c("unemployment=0","education=0"),test="Chisq")
# LR-statistic requires "lmtest" package to be installed and loaded from the library
install.packages("lmtest")
library(lmtest)
lrtest(restricted,unrestricted)
# Displaying results of both models in a single table
modelsummary(list(unrestricted,restricted),stars=TRUE,fmt=4)

~~~

Exercise 28. Considering a log-log model (object model2) from Exercise 23 perform appropriate tests to check if: (a) elasticity coefficient is significantly greater than zero (population has a positive effect on GDP) and (b) elasticity coefficient is significantly different from 1.
Solution Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them. The null hypothesis H_0:\beta_1=0 can be tested against alternative H_1:\beta_1 \gt 0 to check if population has a positive effect on gdp (upper-sided test). Namely, upper-sided test does not require any action because the p-value for the upper-sided test is a half of the p-value for the two-sided test (already reported in the output of model summary). However, testing the null hypothesis H_0:\beta_1=1 against alternative H_1:\beta_1 \ne 1 requires implementation of linearHypothesis() command.
coeftest(model2) # Reports t-statistics and p-values of two-sided alternatives
linearHypothesis(model2,c("log(population)=1"),test="Chisq") # Reports Wald test results

~~~

  • In multivariate model we can be interested in determining which RHS variable affects more dependent variable y!

  • Are estimated coefficients comparable when RHS variables are given in different measurement units?

  • How can we express RHS variables in the same units of measurement?

  • Variables should be standardized -> estimating the model using standardized variables

  • Any standardized variable has a nice property, i.e. it’s mean value is always zero and it’s standard deviation is always 1.

  • Standardized econometric model has no constant term

  • Interpretation of standardized coefficient: if independent variable increases for a one standard deviation the expected value of dependent variable changes by a given standard deviations

Exercise 29. Considering unrestricted model from Exercise 27 determine which of the RHS variables affects GDP more: population, unemployment or education? Save estimated model as an object standardized.
Solution Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them. Only standardized coefficients can be compared to determine the relative importance of variables. Specifically, a larger absolute standardized coefficient indicates that population has a greater effect on gdp.
# Estimating standardized regression (without constant term)
standardized=lm(scale(gdp)~0+scale(population)+scale(unemployment)+scale(education),newdata)
# Reports coefficients from standardized regression, along with standard errors, t-statistics and p-values
coeftest(standardized) 

~~~