1.3 Statistical Inference
1.3.1 Hypothesis Tests
The null hypothesis, denoted by \(\mathrm{H}_{0}\), is a statement about a population parameter. The alternative hypothesis is denoted by \(\mathrm{H}_{1}\). The null hypothesis will be rejected if it appears to be inconsistent with the sample data and will not be rejected otherwise. (Ross 2017)
(a) Test Statistic and Critical Region
A test statistic is a statistic whose value is determined from the sample data. Depending on the value of this test statistic, the null hypothesis will be rejected or not. The critical region, also called the rejection region, is that set of values of the test statistic for which the null hypothesis is rejected. (Ross 2017)
The classical procedure for testing a null hypothesis is to fix a small level of significance \(\alpha\) and then require that the probability of rejecting \(\mathrm{H}_{0}\) when \(\mathrm{H}_{0}\) is true is less than or equal to \(\alpha\). (Ross 2017)
Critical values are calculated by:
The hypothesis should be rejected if stat >= c
is true.
Alternatively, p_value
can be computed first:
and compared to alpha
. The hypothesis should be rejected if p_value >= alpha
is true.
(b) Design of Hypothesis Tests
If you are trying to establish a certain hypothesis, then that hypothesis should be designated as the alternative hypothesis. Similarly, if you are trying to discredit a hypothesis, that hypothesis should be designated the null hypothesis. (Ross 2017)
Example 1.4 (Nicotine Test) Thus, for instance, if the tobacco company is running the experiment to prove that the mean nicotine level of its cigarettes is less than \(1.5,\) then it should choose for the null hypothesis
\[ \mathrm{H}_{0}: \mu \geq 1.5 \]
and for the alternative hypothesis
\[ \mathrm{H}_{1}: \mu<1.5 \]
Then the company could use a rejection of the null hypothesis as “proof” of its claim that the mean nicotine content was less than 1.5 milligrams.(c) Interpretation of Test Result
The rejection of the null hypothesis \(\mathrm{H}_{0}\) is a strong statement that \(\mathrm{H}_{0}\) does not appear to be consistent with the observed data. The result that \(\mathrm{H}_{0}\) is not rejected is a weak statement that should be interpreted to mean that \(\mathrm{H}_{0}\) is consistent with the data. (Ross 2017)
When conducting a statistical test, the thought experiment is that our sample is drawn from some hypothetical population distribution that could have generated the data. Our sample is then compared with hypothetical samples drawn from that hypothetical population distribution. – section 4.3.2, interpreting the test result in (Hendry and Nielsen 2007)
1.3.2 Student’s t-Test (test-t)
1.3.3 Log-Likelihood Ratio Test (test-LLR)
Likelihood ratio tests are well suited for making inferences about restrictions on a well-specified model, where we are able, and willing, to maximize the likelihood function in the unrestricted model as well as the restricted model. (Hendry and Nielsen 2007)
\[ \mathrm{Q} = \frac{\max _{\theta \in \Theta_{R}} \mathrm{L}_{Y_{1}, \ldots, Y_{n}}(\theta)}{\max _{\theta \in \Theta_{U}} \mathrm{L}_{Y_{1}, \ldots, Y_{n}}(\theta)} \]
\[ \mathrm{LR} = -2 \log \mathrm{Q} = 2 \left\{\max _{\theta \in \Theta_{U}} \ell_{Y_{1}, \ldots, Y_{n}}(\theta)-\max _{\theta \in \Theta_{R}} \ell_{Y_{1}, \ldots, Y_{n}}(\theta)\right\} \]
where the closer \(\mathrm{LR}\) is to zero, the more likely it is that \(\theta\) could satisfy the restriction.
A statistical test can now be constructed as a decision rule. If \(\mathrm{Q}\) is (close to) unity, and correspondingly \(\mathrm{LR}\) is small, the restricted maximum likelihood estimate would be (nearly) as likely as the unrestricted estimate, so in that case, we would fail to reject the hypothesis.
(a) Signed test-LLR
whi | stat | df1 | df2 | p_value | prob | if_reject |
---|---|---|---|---|---|---|
logLik | 309.8 | 1 | 7183 | 0 | 0.05 | TRUE |
This signed likelihood ratio statistic is approximately normally distributed when the hypothesis is true: \[ \omega=\operatorname{sign}\left(\widehat{\beta} \right) \sqrt{\mathrm{LR}} \stackrel{\mathrm{D}}{\approx} \mathrm{N}[0,1] \] where the sign function is given by: \[ \operatorname{sign}(x) = \left\{\begin{array}{ll} +1 & \text { if } x \geq 0 \\ -1 & \text { if } x<0 \end{array} \right. \] Then we construct a test by comparing the test statistic \(\omega\) to a critical value \(c\).
whi | stat | df1 | df2 | p_value | prob | if_reject |
---|---|---|---|---|---|---|
logLik-sign | 17.6 | 1 | 7183 | 2.724e-05 | 0.05 | TRUE |
(b) Analysis of Variance (ANOVA)
When there is only one regressor, the sample correlation is represented by r.squared
, which is a measure of goodness-of-fit of the unrestricted model relative to the restricted model. (Hendry and Nielsen 2007) Take the example 1.2 census
for example:
r.squared | adj.r.squared | sigma | statistic | p.value | df | logLik | AIC | BIC | deviance | df.residual |
---|---|---|---|---|---|---|---|---|---|---|
0.07298 | 0.07274 | 0.7252 | 305.1 | 8.115e-66 | 2 | -4254 | 8515 | 8533 | 2038 | 3875 |
The deviance
is the residual sum of square.
When there are multiple regressors, partial sample correlations can be obtained by the 6th column using tab_tidy()
.
term | estimate | std.error | statistic | p.value | p.r.squared |
---|---|---|---|---|---|
(Intercept) | 4.789021 | 0.1247933 | 38.376 | 2.251e-273 | 0.275439 |
educ | -0.049657 | 0.0197261 | -2.517 | 1.187e-02 | 0.001633 |
I(educ^2) | 0.005147 | 0.0007887 | 6.526 | 7.611e-11 | 0.010875 |
The log-likelihood ratio test statistic for any parameter being 0 can be calculated using partial sample correlations and the following equation: \[ \mathrm{LR}=-n \log \left(1-r^{2} \right). \]
#> [1] 42.39353
or using test_llr()
to compare two models: (the restricted model is the one with less regressors)
whi | stat | df1 | df2 | p_value | prob | if_reject |
---|---|---|---|---|---|---|
logLik | 42.39 | 1 | 3876 | 7.464e-11 | 0.05 | TRUE |
or using stats::anova()
:
term | df | sumsq | meansq | statistic | p.value |
---|---|---|---|---|---|
educ | 1 | 160.42 | 160.4165 | 308.34 | 1.774e-66 |
I(educ^2) | 1 | 22.16 | 22.1596 | 42.59 | 7.611e-11 |
Residuals | 3874 | 2015.49 | 0.5203 | NA | NA |
As for the example 1.3 RECS
, to test the hypothesis that coefficient for x5
is 0 in mods_recs[[1]]
, tab_tidy()
is used:
term | estimate | std.error | statistic | p.value | p.r.squared |
---|---|---|---|---|---|
(Intercept) | 8.615613 | 0.200829 | 42.9002 | 2.434e-128 | 0.8626622 |
x2 | -0.259971 | 0.029144 | -8.9203 | 5.153e-17 | 0.2135731 |
x3 | -0.081468 | 0.037994 | -2.1442 | 3.284e-02 | 0.0154493 |
x4 | 0.064473 | 0.019196 | 3.3586 | 8.869e-04 | 0.0370726 |
x5 | -0.034200 | 0.072129 | -0.4742 | 6.357e-01 | 0.0007667 |
x6 | 0.007251 | 0.002403 | 3.0173 | 2.774e-03 | 0.0301358 |
x7 | 0.013147 | 0.017798 | 0.7387 | 4.607e-01 | 0.0018588 |
(c) LLR-Test for More Parameters
Likelihood tests for restricting more than one parameter can be only performed by using values of log likelihood in the original and restricted models. For example, to test the hypothesis that coefficients for x5
and x7
are both 0 in mods_recs[[1]]
, following calculation can be conducted. We cannot reject the hypothesis according the function output.
whi | stat | df1 | df2 | p_value | prob | if_reject |
---|---|---|---|---|---|---|
logLik | 5.178 | 2 | 298 | 0.07509 | 0.05 | FALSE |
The above three test statistics are related in an additive manner, so models with multiple regressors can be reduced in a step-wise procedure. During every step, partial correlations for regressors can be used as the indication of the next term to be reduced.
test_llr(mods_recs[[2]], mods_recs[[3]])$stat +
test_llr(mods_recs[[1]], mods_recs[[2]])$stat -
test_llr(mods_recs[[1]], mods_recs[[3]])$stat <= 1e-5
#> [1] TRUE
References
Hendry, David F, and Bent Nielsen. 2007. Econometric Modeling: A Likelihood Approach. Princeton University Press.
Ross, Sheldon M. 2017. Introductory Statistics. Academic Press.