6.1 Significance testing
Keep in mind that estimator ˆβ is a random vector with (k+1) elements ˆβ0, ˆβ1, ˆβ2,…,ˆβk
Significance testing of each parameter individually requires standardization of it’s estimator ˆβj (by standardizing an estimator we get a random variable z of standard normal distribution with zero mean and unit variance)
z=ˆβj−βjse(ˆβj)=ˆβj−βjσu√diagj(xTx)−1∼ N(0, 1)
Standard deviation of error terms σu is unknown, and expression (6.1) is preformulated: standardized variable z is divided with the square root of the fraction between χ2 variable and degrees of freedom df [see EQUATION (5.22)] ˆβj−βjse(ˆβj)√χ2df=ˆβj−βjσu√diagj(xTx)−1√ˆσ2uσ2u=ˆβj−βjˆσu√diagj(xTx)−1∼ t(df=n−k−1)
Expression (6.2) defines the t-statistic which is used to test the significance of each parameter individually
Parameter βj is statistically significant if it’s estimated value is different from zero, i.e. whenever the null hypothesis is rejected H0: βj=0
If the null hypothesis is true the t-statistic is a variable of Student’s t-distribution with degrees of freedom df=n−k−1 tj=ˆβj−βjse(ˆβj)=ˆβj−0se(ˆβj)=ˆβjse(ˆβj)
Opposite to the null hypothesis 3 types of test can be performed [TABLE 6.1]
Alternative | P-value | Test type |
---|---|---|
H_1: β_j ≠ 0 | 2P(t > |t_j|) | two-sided |
H_1: β_j < 0 | P(t < t_j) | lower-sided |
H_1: β_j > 0 | P(t > t_j) | upper-sided |
The null hypothesis ~H_0:~\beta_j=0~ will be rejected whenever p-value is less than significance level (1%, 5% or 10%)
For a given parameter we can assume not only value of zero but any real number -> H_0:~\beta_j=a
We can also assume some linear restrictions on more than one parameter or possibly all parameters, e.g. considering a multivariate model without restrictions (so called full model) y_i=\beta_0+\beta_1x_{i,1}+\beta_2x_{i,2}+\beta_3x_{i,3}+u_i we can test if variables x_1 and x_2 have the same effect on the variable y and if x_3 has no effect on y!
If the null hypothesis is true ~H_0:~\beta_1=\beta_2,~\beta_3=0~ the model with restrictions becomes y_i=\beta_0+\beta_1(x_{i,1}+x_{i,2})+u_i
After estimating both models using OLS method, residulas sums of squares RSS's are obtained and F-statistic is calculated \begin{equation}F^\prime=\frac{RSS_R-RSS_U}{RSS_U}\times \frac{n-k-1}{q}, \tag{6.4}\end{equation} where RSS_R is residual sum of squares from restricted model and RSS_U is residual sum of squares from unrestricted model, while q is the number of restrictions (in given example q=2).
Test statistic from F-distribution in (6.4) is defined with numerator degrees of freedom df_1=q and denominator degrees of freedom df_2=n-k-1
Number of restrictions q is equal to the parameters difference between unrestricted and restricted model
The null hypothesis can be written in matrix form \begin{equation}H_0:~R\beta=r, \tag{6.5}\end{equation} where R is restriction matrix, \beta is vector of parameters from unrestricted model and r is a vector of assumed values. In given example the null hypothesis is defined as
H_0:~\underbrace{\begin{bmatrix} ~0 & ~1 & -1 & ~0 \\ ~0 & ~0 & ~~0 & ~1 \end{bmatrix}}_{R} \underbrace{\begin{bmatrix}\beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3 \end{bmatrix}}_{\beta}=\underbrace{\begin{bmatrix}0 \\ 0 \end{bmatrix}}_{r}
Testing the null hypothesis that all independent variables are not statistically significant ~H_0:~\beta_1=\beta_2=\beta_3=0~ can also be written in matrix form H_0:~\underbrace{\begin{bmatrix} 0 & 1 & 0 & 0 \\ 0 & 0 & 1 & 0 \\ 0 & 0 & 0 & 1 \end{bmatrix}}_{R} \underbrace{\begin{bmatrix}\beta_0 \\ \beta_1 \\ \beta_2 \\ \beta_3 \end{bmatrix}}_{\beta}=\underbrace{\begin{bmatrix}0 \\ 0 \\ 0 \end{bmatrix}}_{r}
If the null hypothesis is true the restricted model includes constant term only y_i=\beta_0+u_i, where estimated value \hat{\beta}_0 presents a mean of dependent variable \bar{y}
The joint test of significance refers to hypothesis testing that all RHS variables have no significant effect on dependent variable, which is usually performed within analysis of variance
Analysis of variance as well as F-statistic and associated p-value are presented in ANOVA table
Source of variation | ~~~Sum of squares~~~ | df | Test statistic | P-value |
---|---|---|---|---|
ESS (explained by the model) | y^T y - β^T X^T y | k | F^{\prime} =\frac{ESS/k}{(RSS/ (n - k - 1)} | P(F>F^{\prime}) |
RSS (unexplained variations) | β^T X^T y - (1/n) y^T J y | n - k - 1 | ||
TSS (total variation of variable y) | y^T y - (1/n) y^T J y | n - 1 |
- Fractions \frac{ESS}{k} and \frac{RSS}{n-k-1} are \chi^2 variables with degrees of freedom k and n-k-1, respectively.
Analysis of variance implies that TSS=ESS+RSS, while the fraction \frac{ESS}{TSS} provides information how well estimated model fits the data. Proportion of explained variations of dependent variable y is known as coefficient of determination \begin{equation} R^2=\frac{ESS}{TSS};~~~~~~0 \leq R^2 \leq 1 \tag{6.6} \end{equation}
- If the null hypothesis is true in general ~H_0:~\beta_1=\beta_2=\dots=\beta_k=0~ coefficient of determination from restricted model R^2_R=0 and number of restrictions is equal to the number of RHS variables from unrestricted model (q=k), and thus \begin{equation} F^{\prime}=\frac{R^2_U-0}{1-R^2_U}\times\frac{n-k-1}{k}=\frac{R^2/k}{(1-R^2)/(n-k-1)} \tag{6.7} \end{equation}
Testing the hypothesis that none of the RHS variables significantly effect dependent variable y comes down to testing coefficient determination significance by F-statistic \begin{equation} H_0:~R^2=0;~~~~~~~~~F^{\prime}=\frac{R^2/k}{(1-R^2)/(n-k-1)} \tag{6.8} \end{equation}
Alternative tests related to the F-statistic are also used to test certain linear restrictions on parameters:
- Wald test – W
- Likelihood ratio test – LR
- Lagrange multiplier test – LM
~~~
- Wald test W requires estimating only the unrestricted model using OLS method
\begin{equation} W=(R\hat{\beta}-r)^{T}(R\hat{\Gamma}R^{T})^{-1}(R\hat{\beta}-r)\sim\chi^2_{(df=q)} \tag{6.9} \end{equation}
Test statistic W in (6.9) is \chi^2 variable with q degrees of freedom
Likelihood ratio test LR requires estimating both models (unrestricted and restricted) using MLE method. The maximum value of the likelihood function is usually taken into logs and it is called log-Likelihood \begin{equation} LR=-2(logL_U-logL_R)=-2log\frac{L_U}{L_R}\sim\chi^2_{(df=q)} \tag{6.10} \end{equation} where L_U is a likelihood of unrestricted model and L_R is a likelihood of restricted model.
LR-statistic is asymptotically equivalent to W-statistic if the assumption of normality holds
Lagrange multiplier test LM requires estimating only the restricted model using OLS method in the first step. In the second step residuals from restricted model are regressed on all independent variables and coefficient of determination from test regression R^2_{test} is multiplied by the sample size n \begin{equation} LM=n R^2_{test} \sim\chi^2_{(df=q)} \tag{6.11} \end{equation}
LM-statistic is also asymptotically equivalent to W-statistic. All three test statistics are approaching to the same \chi^2 value, so it is appropriate to use these tests in a large samples.
\begin{equation}\begin{matrix}qF=W \\ LM\leq LR\leq W\end{matrix} \tag{6.12} \end{equation}
lm()
command in RStudio and sample data from newdata
object (already loaded a text file eu_countries.txt
) estimate a multivariate model: y_i=\beta_0+\beta_1x_{i,1}+\beta_2x_{i,2}+\beta_3x_{i,3}+u_i, where y=gdp
, x_1=population
, x_2=unemployment
and x_3=education
. Compute test statistics F, W, and LR to check if variables x_2 and x_3 are redundant (significance level \alpha=0.05). Present results of unrestricted and restricted model using modelsummary()
command.
Solution
Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them. Redundancy of variables x_2 and x_3 can be checked by testing the null hypothesis H_0:\beta_2=\beta_3=0 within F-statistic, W-statistic, or LR-statistic. All three types of tests reject the null hypothesis (p-value \lt \alpha), meaning thatunemployment
and education
are not redundant variables.
# Unrestricted model and restricted model estimation
=lm(gdp~population+unemployment+education,data=newdata)
unrestricted=lm(gdp~population,data=newdata)
restricted# F-statistic within ANOVA table
anova(restricted,unrestricted)
# W-statistic requires "car" package to be installed and loaded from the library
install.packages("car")
library(car)
# Specifying the type of Wald test (F or Chi-square statistic)
linearHypothesis(unrestricted,c("unemployment=0","education=0"),test="Chisq")
# LR-statistic requires "lmtest" package to be installed and loaded from the library
install.packages("lmtest")
library(lmtest)
lrtest(restricted,unrestricted)
# Displaying results of both models in a single table
modelsummary(list(unrestricted,restricted),stars=TRUE,fmt=4)
~~~
model2
) from Exercise 23 perform appropriate tests to check if: (a) elasticity coefficient is significantly greater than zero (population has a positive effect on GDP) and (b) elasticity coefficient is significantly different from 1.
Solution
Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them. The null hypothesis H_0:\beta_1=0 can be tested against alternative H_1:\beta_1 \gt 0 to check ifpopulation
has a positive effect on gdp
(upper-sided test). Namely, upper-sided test does not require any action because the p-value for the upper-sided test is a half of the p-value for the two-sided test (already reported in the output of model summary). However, testing the null hypothesis H_0:\beta_1=1 against alternative H_1:\beta_1 \ne 1 requires implementation of linearHypothesis()
command.
coeftest(model2) # Reports t-statistics and p-values of two-sided alternatives
linearHypothesis(model2,c("log(population)=1"),test="Chisq") # Reports Wald test results
~~~
In multivariate model we can be interested in determining which RHS variable affects more dependent variable y!
Are estimated coefficients comparable when RHS variables are given in different measurement units?
How can we express RHS variables in the same units of measurement?
Variables should be standardized -> estimating the model using standardized variables
Any standardized variable has a nice property, i.e. it’s mean value is always zero and it’s standard deviation is always 1.
Standardized econometric model has no constant term
Interpretation of standardized coefficient: if independent variable increases for a one standard deviation the expected value of dependent variable changes by a given standard deviations
unrestricted
model from Exercise 27 determine which of the RHS variables affects GDP more: population, unemployment or education? Save estimated model as an object standardized
.
Solution
Copy the code lines below to the clipboard, paste them into an R Script file opened in RStudio, and run them. Only standardized coefficients can be compared to determine the relative importance of variables. Specifically, a larger absolute standardized coefficient indicates thatpopulation
has a greater effect on gdp
.
# Estimating standardized regression (without constant term)
=lm(scale(gdp)~0+scale(population)+scale(unemployment)+scale(education),newdata)
standardized# Reports coefficients from standardized regression, along with standard errors, t-statistics and p-values
coeftest(standardized)
~~~