Chapter 21 Basic Hypothesis Tests for Linear Models

21.1 Introduction

In this section we consider the application of hypothesis testing to linear models. Suppose that we are given the linear model,
Yi=β0+β1X1i+β2X2i++βp1X(p1)i+ϵi,

where ϵiN(0,σ2) are independent and identically distributed. We are interested in testing the hypothesis that a coefficient βj is equal to some value b. In particular, we are most interested in b=0 as setting βj=0 means that xji is not important in predicting Yi, see Section 21.2. We can also construct confidence intervals for βj and in Section 21.4 extend hypothesis testing to multiple (all) parameters to test whether or not a linear model is useful in a given modelling scenario.

21.2 Tests on a single parameter

Given the linear model,
Yi=β0+β1X1i+β2X2i++βp1X(p1)i+ϵi,

where ϵiN(0,σ2), we want to test H0:βj=b vs. H1:βjb at significance level α where b is some constant. Typically, we might choose α=0.05 (common alternatives α=0.01 or α=0.1).

The decision rule is to reject H0 if |T|=|β^jbSE(β^j)|>tnp,α/2, where SE(β^j)=Var(β^j) is the standard error of the parameter. Recall from Section 17 that Var(β^j)=s2((ZTZ)1)jj.

A special case of the above test occurs when we choose b=0. The test H0:βj=0 vs. H1:βj0 at level α has the decision rule to reject H0 if
|T|=|β^jSE(β^j)|>tnp,α/2.

Note that if we reject H0:βj=0 we are claiming that the explanatory variable Xj is useful in predicting the response variable Y when all the other variables are included in the model.

The test statistic |T|=|β^jSE(β^j)| is often reported in the output from statistical software such as R.

Fuel consumption
A dataset considers fuel consumption for 50 US states plus Washington DC, that is n=51 observations. The response fuel is fuel consumption measured in gallons per person. The predictors considered are dlic, the percentage of licensed drivers, tax, motor fuel tax in US cents per gallon, inc, income per person in $1,000s and road, the log of the number of miles of federal highway. Fitting a linear model of the form fuel=β0+β1dlic+β2tax+β3inc+β4road using R, the output is

Estimate Standard Error
β0 154.19 194.906
β1 4.719 1.285
β2 -4.228 2.030
β3 -6.135 2.194
β4 26.755 9.337

Test H0:β2=0 vs. H1:β20 at significance level α=0.05.


Watch Video 31 for a work through in R of testing the null hypothesis.

Video 31: Fuel consumption example.

Hypothesis test for β2.

The decision rule is to reject H0 if

|T|=|β^2SE(β^2)|=|4.2282.030|=|2.083|>t46,0.025=2.013.

So we reject H0 and conclude that the tax variable is useful for prediction of fuel after having included the other variables.

We note that the p-values is P(|t46|>2.083)=0.0428 and therefore would not reject the null hypothesis β2=0 at significance level α=0.01.

21.3 Confidence intervals for parameters

Recall that
|T|=|β^jSE(β^j)|tnp.
It follows that a 100(1α)% confidence interval for βj is
(β^jtnp,α/2SE(β^j),β^j+tnp,α/2SE(β^j)) where SE(β^j)=s((ZTZ)1)jj.

Fuel consumption (continued)
Consider Example 21.2.1 (Fuel consumption), construct a 95% confidence interval for β2.


A 95% confidence interval for β2 is
(β^2t46,0.025SE(β^j),β^2+t46,0.025SE(β^j))=(4.2282.013×2.030,4.228+2.013×2.030)=(8.31,0.14)

This confidence interval does not contain 0 (just) as we would expect from the calculation of the p-value in Example 20.2.1 (Fuel consumption) above.

21.4 Tests for the existence of regression

We want to test

H0:β1=β2==βp1=0

versus

H1:βj0

for some j at significance level α.

Note that if we reject H0 we are saying that the model
Yi=β0+β1X1i+β2X2i++βp1X(p1)i

has some ability to explain the variance that we are observing in Y. That is, there exists a linear relationship between the explanatory variables and the response variable.

If D0 is the model deviance under the null hypothesis and D1 is the model deviance under the alternative hypothesis, then the decision rule is to reject H0 if
F=(D0D1)/(p1)D1/(np)>Fp1,np,α.
Fuel consumption (continued)
For the data in Example 21.2.1 (Fuel consumption), the two competing models are
M1:fuel=β0+β1dlic+β2tax+β3inc+β4roadM0:fuel=β0

The models have residual sum of squares D1=193700 and D0=395694.1, respectively. We test H0:β1==β4=0 vs. H1:βj0 for some j=1,,4 at level α=0.05.


The decision rule is to reject H0 if
F=(395694.1193700)/(51)193700/(515)=50498.5254210.870=11.99>F4,46,0.05=2.574.

Therefore, we reject H0 and can say that the linear model has some power in explaining the variability in fuel.

Note that the p-value for the F test is 9.331×107=P(F4,46>11.99). This is given in R by 1-pf(11.99,4,46) and is reported in the last line of summary() for a linear model in R.