1.3 Statistical Inference

1.3.1 Hypothesis Tests

The null hypothesis, denoted by $\mathrm{H}_{0}$ , is a statement about a population parameter. The alternative hypothesis is denoted by $\mathrm{H}_{1}$ . The null hypothesis will be rejected if it appears to be inconsistent with the sample data and will not be rejected otherwise. (Ross 2017)

(a) Test Statistic and Critical Region

A test statistic is a statistic whose value is determined from the sample data. Depending on the value of this test statistic, the null hypothesis will be rejected or not. The critical region, also called the rejection region, is that set of values of the test statistic for which the null hypothesis is rejected. (Ross 2017)

The classical procedure for testing a null hypothesis is to fix a small level of significance $\alpha$ and then require that the probability of rejecting $\mathrm{H}_{0}$ when $\mathrm{H}_{0}$ is true is less than or equal to $\alpha$ . (Ross 2017)

Critical values are calculated by:

c <- qchisq(1 - alpha, df, lower.tail = TRUE, log.p = FALSE)

The hypothesis should be rejected if stat >= c is true.

Alternatively, p_value can be computed first:

1 - pchisq(stat, df1)

and compared to alpha. The hypothesis should be rejected if p_value >= alpha is true.

(b) Design of Hypothesis Tests

If you are trying to establish a certain hypothesis, then that hypothesis should be designated as the alternative hypothesis. Similarly, if you are trying to discredit a hypothesis, that hypothesis should be designated the null hypothesis. (Ross 2017)

Example 1.4 (Nicotine Test) Thus, for instance, if the tobacco company is running the experiment to prove that the mean nicotine level of its cigarettes is less than $1.5,$ then it should choose for the null hypothesis

$\mathrm{H}_{0}: \mu \geq 1.5$

and for the alternative hypothesis

$\mathrm{H}_{1}: \mu<1.5$

Then the company could use a rejection of the null hypothesis as “proof” of its claim that the mean nicotine content was less than 1.5 milligrams.

(c) Interpretation of Test Result

The rejection of the null hypothesis $\mathrm{H}_{0}$ is a strong statement that $\mathrm{H}_{0}$ does not appear to be consistent with the observed data. The result that $\mathrm{H}_{0}$ is not rejected is a weak statement that should be interpreted to mean that $\mathrm{H}_{0}$ is consistent with the data. (Ross 2017)

When conducting a statistical test, the thought experiment is that our sample is drawn from some hypothetical population distribution that could have generated the data. Our sample is then compared with hypothetical samples drawn from that hypothetical population distribution. – section 4.3.2, interpreting the test result in (Hendry and Nielsen 2007)

1.3.2 Student’s t-Test (test-t)

1.3.3 Log-Likelihood Ratio Test (test-LLR)

Likelihood ratio tests are well suited for making inferences about restrictions on a well-specified model, where we are able, and willing, to maximize the likelihood function in the unrestricted model as well as the restricted model. (Hendry and Nielsen 2007)

$\mathrm{Q} = \frac{\max _{\theta \in \Theta_{R}} \mathrm{L}_{Y_{1}, \ldots, Y_{n}}(\theta)}{\max _{\theta \in \Theta_{U}} \mathrm{L}_{Y_{1}, \ldots, Y_{n}}(\theta)}$

$\mathrm{LR} = -2 \log \mathrm{Q} = 2 \left\{\max _{\theta \in \Theta_{U}} \ell_{Y_{1}, \ldots, Y_{n}}(\theta)-\max _{\theta \in \Theta_{R}} \ell_{Y_{1}, \ldots, Y_{n}}(\theta)\right\}$

where the closer $\mathrm{LR}$ is to zero, the more likely it is that $\theta$ could satisfy the restriction.

A statistical test can now be constructed as a decision rule. If $\mathrm{Q}$ is (close to) unity, and correspondingly $\mathrm{LR}$ is small, the restricted maximum likelihood estimate would be (nearly) as likely as the unrestricted estimate, so in that case, we would fail to reject the hypothesis.

(a) Signed test-LLR

If and how to use one-tailed testing in multiple regression

test_llr(mods_part[[1]], mods_part[[2]]) %>% tab_ti()

whi	stat	df1	df2	p_value	prob	if_reject
logLik	309.8	1	7183	0	0.05	TRUE

This signed likelihood ratio statistic is approximately normally distributed when the hypothesis is true: $\omega=\operatorname{sign}\left(\widehat{\beta} \right) \sqrt{\mathrm{LR}} \stackrel{\mathrm{D}}{\approx} \mathrm{N}[0,1]$ where the sign function is given by: $\operatorname{sign}(x) = \left\{\begin{array}{ll} +1 & \text { if } x \geq 0 \\ -1 & \text { if } x<0 \end{array} \right.$ Then we construct a test by comparing the test statistic $\omega$ to a critical value $c$ .

test_llr_sign(mods_part[[1]], mods_part[[2]], T) %>% tab_ti()

whi	stat	df1	df2	p_value	prob	if_reject
logLik-sign	17.6	1	7183	2.724e-05	0.05	TRUE

(b) Analysis of Variance (ANOVA)

When there is only one regressor, the sample correlation is represented by r.squared, which is a measure of goodness-of-fit of the unrestricted model relative to the restricted model. (Hendry and Nielsen 2007) Take the example 1.2 census for example:

mods_census[[1]] %>% glance() %>% tab_ti()

r.squared	adj.r.squared	sigma	statistic	p.value	df	logLik	AIC	BIC	deviance	df.residual
0.07298	0.07274	0.7252	305.1	8.115e-66	2	-4254	8515	8533	2038	3875

The deviance is the residual sum of square.

When there are multiple regressors, partial sample correlations can be obtained by the 6th column using tab_tidy().

term	estimate	std.error	statistic	p.value	p.r.squared
(Intercept)	4.789021	0.1247933	38.376	2.251e-273	0.275439
educ	-0.049657	0.0197261	-2.517	1.187e-02	0.001633
I(educ^2)	0.005147	0.0007887	6.526	7.611e-11	0.010875

The log-likelihood ratio test statistic for any parameter being 0 can be calculated using partial sample correlations and the following equation: $\mathrm{LR}=-n \log \left(1-r^{2} \right).$

mods_census[[2]] %>%
  tab_tidy(T) %>%
  {.[3, 6]} %>%
  as.numeric() %>%
  {- 3877 * log(1 - .)}

#> [1] 42.39353

or using test_llr() to compare two models: (the restricted model is the one with less regressors)

test_llr(mods_census[[2]], mods_census[[1]]) %>% tab_ti()

whi	stat	df1	df2	p_value	prob	if_reject
logLik	42.39	1	3876	7.464e-11	0.05	TRUE

or using stats::anova():

term	df	sumsq	meansq	statistic	p.value
educ	1	160.42	160.4165	308.34	1.774e-66
I(educ^2)	1	22.16	22.1596	42.59	7.611e-11
Residuals	3874	2015.49	0.5203	NA	NA

As for the example 1.3 RECS, to test the hypothesis that coefficient for x5 is 0 in mods_recs[[1]], tab_tidy() is used:

term	estimate	std.error	statistic	p.value	p.r.squared
(Intercept)	8.615613	0.200829	42.9002	2.434e-128	0.8626622
x2	-0.259971	0.029144	-8.9203	5.153e-17	0.2135731
x3	-0.081468	0.037994	-2.1442	3.284e-02	0.0154493
x4	0.064473	0.019196	3.3586	8.869e-04	0.0370726
x5	-0.034200	0.072129	-0.4742	6.357e-01	0.0007667
x6	0.007251	0.002403	3.0173	2.774e-03	0.0301358
x7	0.013147	0.017798	0.7387	4.607e-01	0.0018588

(c) LLR-Test for More Parameters

Likelihood tests for restricting more than one parameter can be only performed by using values of log likelihood in the original and restricted models. For example, to test the hypothesis that coefficients for x5 and x7 are both 0 in mods_recs[[1]], following calculation can be conducted. We cannot reject the hypothesis according the function output.

whi	stat	df1	df2	p_value	prob	if_reject
logLik	5.178	2	298	0.07509	0.05	FALSE

The above three test statistics are related in an additive manner, so models with multiple regressors can be reduced in a step-wise procedure. During every step, partial correlations for regressors can be used as the indication of the next term to be reduced.

test_llr(mods_recs[[2]], mods_recs[[3]])$stat +
  test_llr(mods_recs[[1]], mods_recs[[2]])$stat -
  test_llr(mods_recs[[1]], mods_recs[[3]])$stat <= 1e-5

#> [1] TRUE

References

Hendry, David F, and Bent Nielsen. 2007. Econometric Modeling: A Likelihood Approach. Princeton University Press.

Ross, Sheldon M. 2017. Introductory Statistics. Academic Press.