1.3 Statistical Inference

1.3.1 Hypothesis Tests

The null hypothesis, denoted by \(\mathrm{H}_{0}\), is a statement about a population parameter. The alternative hypothesis is denoted by \(\mathrm{H}_{1}\). The null hypothesis will be rejected if it appears to be inconsistent with the sample data and will not be rejected otherwise. (Ross 2017)

(a) Test Statistic and Critical Region

A test statistic is a statistic whose value is determined from the sample data. Depending on the value of this test statistic, the null hypothesis will be rejected or not. The critical region, also called the rejection region, is that set of values of the test statistic for which the null hypothesis is rejected. (Ross 2017)

The classical procedure for testing a null hypothesis is to fix a small level of significance \(\alpha\) and then require that the probability of rejecting \(\mathrm{H}_{0}\) when \(\mathrm{H}_{0}\) is true is less than or equal to \(\alpha\). (Ross 2017)

Critical values are calculated by:

c <- qchisq(1 - alpha, df, lower.tail = TRUE, log.p = FALSE)

The hypothesis should be rejected if stat >= c is true.

Alternatively, p_value can be computed first:

1 - pchisq(stat, df1)

and compared to alpha. The hypothesis should be rejected if p_value >= alpha is true.

(b) Design of Hypothesis Tests

If you are trying to establish a certain hypothesis, then that hypothesis should be designated as the alternative hypothesis. Similarly, if you are trying to discredit a hypothesis, that hypothesis should be designated the null hypothesis. (Ross 2017)

Example 1.4 (Nicotine Test) Thus, for instance, if the tobacco company is running the experiment to prove that the mean nicotine level of its cigarettes is less than \(1.5,\) then it should choose for the null hypothesis

\[ \mathrm{H}_{0}: \mu \geq 1.5 \]

and for the alternative hypothesis

\[ \mathrm{H}_{1}: \mu<1.5 \]

Then the company could use a rejection of the null hypothesis as “proof” of its claim that the mean nicotine content was less than 1.5 milligrams.

(c) Interpretation of Test Result

The rejection of the null hypothesis \(\mathrm{H}_{0}\) is a strong statement that \(\mathrm{H}_{0}\) does not appear to be consistent with the observed data. The result that \(\mathrm{H}_{0}\) is not rejected is a weak statement that should be interpreted to mean that \(\mathrm{H}_{0}\) is consistent with the data. (Ross 2017)

When conducting a statistical test, the thought experiment is that our sample is drawn from some hypothetical population distribution that could have generated the data. Our sample is then compared with hypothetical samples drawn from that hypothetical population distribution. – section 4.3.2, interpreting the test result in (Hendry and Nielsen 2007)

1.3.3 Log-Likelihood Ratio Test (test-LLR)

Likelihood ratio tests are well suited for making inferences about restrictions on a well-specified model, where we are able, and willing, to maximize the likelihood function in the unrestricted model as well as the restricted model. (Hendry and Nielsen 2007)

\[ \mathrm{Q} = \frac{\max _{\theta \in \Theta_{R}} \mathrm{L}_{Y_{1}, \ldots, Y_{n}}(\theta)}{\max _{\theta \in \Theta_{U}} \mathrm{L}_{Y_{1}, \ldots, Y_{n}}(\theta)} \]

\[ \mathrm{LR} = -2 \log \mathrm{Q} = 2 \left\{\max _{\theta \in \Theta_{U}} \ell_{Y_{1}, \ldots, Y_{n}}(\theta)-\max _{\theta \in \Theta_{R}} \ell_{Y_{1}, \ldots, Y_{n}}(\theta)\right\} \]

where the closer \(\mathrm{LR}\) is to zero, the more likely it is that \(\theta\) could satisfy the restriction.

A statistical test can now be constructed as a decision rule. If \(\mathrm{Q}\) is (close to) unity, and correspondingly \(\mathrm{LR}\) is small, the restricted maximum likelihood estimate would be (nearly) as likely as the unrestricted estimate, so in that case, we would fail to reject the hypothesis.

(a) Signed test-LLR

test_llr(mods_part[[1]], mods_part[[2]]) %>% tab_ti()
whi stat df1 df2 p_value prob if_reject
logLik 309.8 1 7183 0 0.05 TRUE

This signed likelihood ratio statistic is approximately normally distributed when the hypothesis is true: \[ \omega=\operatorname{sign}\left(\widehat{\beta} \right) \sqrt{\mathrm{LR}} \stackrel{\mathrm{D}}{\approx} \mathrm{N}[0,1] \] where the sign function is given by: \[ \operatorname{sign}(x) = \left\{\begin{array}{ll} +1 & \text { if } x \geq 0 \\ -1 & \text { if } x<0 \end{array} \right. \] Then we construct a test by comparing the test statistic \(\omega\) to a critical value \(c\).

test_llr_sign(mods_part[[1]], mods_part[[2]], T) %>% tab_ti()
whi stat df1 df2 p_value prob if_reject
logLik-sign 17.6 1 7183 2.724e-05 0.05 TRUE

(b) Analysis of Variance (ANOVA)

When there is only one regressor, the sample correlation is represented by r.squared, which is a measure of goodness-of-fit of the unrestricted model relative to the restricted model. (Hendry and Nielsen 2007) Take the example 1.2 census for example:

mods_census[[1]] %>% glance() %>% tab_ti()
r.squared adj.r.squared sigma statistic p.value df logLik AIC BIC deviance df.residual
0.07298 0.07274 0.7252 305.1 8.115e-66 2 -4254 8515 8533 2038 3875

The deviance is the residual sum of square.

When there are multiple regressors, partial sample correlations can be obtained by the 6th column using tab_tidy().

term estimate std.error statistic p.value p.r.squared
(Intercept) 4.789021 0.1247933 38.376 2.251e-273 0.275439
educ -0.049657 0.0197261 -2.517 1.187e-02 0.001633
I(educ^2) 0.005147 0.0007887 6.526 7.611e-11 0.010875

The log-likelihood ratio test statistic for any parameter being 0 can be calculated using partial sample correlations and the following equation: \[ \mathrm{LR}=-n \log \left(1-r^{2} \right). \]

mods_census[[2]] %>%
  tab_tidy(T) %>%
  {.[3, 6]} %>%
  as.numeric() %>%
  {- 3877 * log(1 - .)}
#> [1] 42.39353

or using test_llr() to compare two models: (the restricted model is the one with less regressors)

test_llr(mods_census[[2]], mods_census[[1]]) %>% tab_ti()
whi stat df1 df2 p_value prob if_reject
logLik 42.39 1 3876 7.464e-11 0.05 TRUE

or using stats::anova():

term df sumsq meansq statistic p.value
educ 1 160.42 160.4165 308.34 1.774e-66
I(educ^2) 1 22.16 22.1596 42.59 7.611e-11
Residuals 3874 2015.49 0.5203 NA NA

As for the example 1.3 RECS, to test the hypothesis that coefficient for x5 is 0 in mods_recs[[1]], tab_tidy() is used:

term estimate std.error statistic p.value p.r.squared
(Intercept) 8.615613 0.200829 42.9002 2.434e-128 0.8626622
x2 -0.259971 0.029144 -8.9203 5.153e-17 0.2135731
x3 -0.081468 0.037994 -2.1442 3.284e-02 0.0154493
x4 0.064473 0.019196 3.3586 8.869e-04 0.0370726
x5 -0.034200 0.072129 -0.4742 6.357e-01 0.0007667
x6 0.007251 0.002403 3.0173 2.774e-03 0.0301358
x7 0.013147 0.017798 0.7387 4.607e-01 0.0018588

(c) LLR-Test for More Parameters

Likelihood tests for restricting more than one parameter can be only performed by using values of log likelihood in the original and restricted models. For example, to test the hypothesis that coefficients for x5 and x7 are both 0 in mods_recs[[1]], following calculation can be conducted. We cannot reject the hypothesis according the function output.

whi stat df1 df2 p_value prob if_reject
logLik 5.178 2 298 0.07509 0.05 FALSE

The above three test statistics are related in an additive manner, so models with multiple regressors can be reduced in a step-wise procedure. During every step, partial correlations for regressors can be used as the indication of the next term to be reduced.

test_llr(mods_recs[[2]], mods_recs[[3]])$stat +
  test_llr(mods_recs[[1]], mods_recs[[2]])$stat -
  test_llr(mods_recs[[1]], mods_recs[[3]])$stat <= 1e-5
#> [1] TRUE

References

Hendry, David F, and Bent Nielsen. 2007. Econometric Modeling: A Likelihood Approach. Princeton University Press.

Ross, Sheldon M. 2017. Introductory Statistics. Academic Press.