Chapter 19 Hypothesis Testing

19.1 Introduction to hypothesis testing

In estimation, we are interested in asking ourselves the question what is the value of some particular parameter of interest in the population. For example, what is the average annual income of residents in the UK?

Often there are times in statistics when we are not interested in the specific value of the parameter, but rather are interested in asserting some statement regarding the parameter of interest. Some examples:

  • We want to claim that the average annual income of UK residents is more than or equal to £35,000.
  • We want to assess whether the average annual income of men in academia in the UK is the same as that of women in similar ranks.
  • We want to determine whether the number of cars crossing a certain intersection follows a Poisson distribution or whether it is more likely to come from a geometric distribution.

To perform a statistical hypothesis test, one needs to specify two disjoint hypotheses in terms of the parameters of the distribution that are of interest. They are

  • H0: Null Hypothesis,
  • H1: Alternative Hypothesis.

Traditionally, we choose H0 to be the claim that we would like to assert.

Returning to our examples:

  • We want to claim that the average annual income of UK residents is more than or equal to £35,000. We test
    H0:μ35,000vs.H1:μ<35,000.
  • We want to assess whether the average annual income of men in academia in the UK is the same as that of women at similar ranks. We test
    H0:μmen=μwomenvs.H1:μmenμwomen.
  • We want to determine whether the number of cars crossing a certain intersection follows a Poisson distribution or whether it is more likely to come from a geometric distribution. We test
    H0:XPo(2)vs.H1:XGeom(0.5).

Hypotheses where the distribution is completely specified are called simple hypotheses. For example, H0 and H1 in the car example and H0 in the gender wage example are all simple hypotheses.

Hypotheses where the distribution is not completely specified are called composite hypotheses. For example, H0 and H1 in the average annual income example and H1 in the gender wage example are all composite hypotheses.

Note that in the average annual income and gender wage examples, the null and alternative hypotheses cover all possibilities, whereas for the car example there are many other choices of distributions which could be hypothesized.

The conclusion of a hypothesis test
We will reject H0 if there is sufficient information from our sample that indicates that the null hypothesis cannot be true thereby concluding the alternative hypothesis is true.

We will not reject H0 if there is not sufficient information in the sample to refute our claim.

The remainder of this section is structured as follows. We define Type I and Type II errors, which are the probability of making the wrong decision in a hypothesis test. In Section 19.3 we show how to construct hypothesis tests starting with hypothesis tests for the mean of a normal distribution with known variance. This is extended to the case where the variance is unknown and where we have two samples we want to compare. We introduce p-values which give a measure of how likely (unlikely) the observed data are if the null hypothesis is true.We then consider hypothesis testing in a wide range of scenarios:-

19.2 Type I and Type II errors

Type I error
A Type I error occurs when one chooses to incorrectly reject a true null hypothesis.

A Type I error is also commonly referred to as a false positive.

Type II error
A Type II error occurs when one fails to reject a false null hypothesis.

A Type II error is also commonly referred to as a false negative.

Type I error and Type II error are summarised in the following decision table.

One accepts the Null One rejects the Null
Null hypothesis is true Correct Conclusion Type I Error
Null hypothesis is false Type II Error Correct Conclusion

Significance level
The significance level or size of the test is

α=P(Type I error)=P(Reject H0|H0 true).

Typical choices for α are 0.01, 0.05 and 0.10.

Probability of Type II error
The probability of a Type II error is

β=P(Type II error)=P(Do Not Reject H0|H1 true).

Consider the following properties of α and β:

  • It can be shown that there is an inverse relationship between α and β, that is as α increases, β decreases and vice versa. Therefore for a fixed sample size one can only choose to control one of the types of error. In hypothesis testing we choose to control Type I error and select our hypotheses initially so the “worse” error is the Type II error.
  • The value of both α and β depend on the value of the underlying parameters. Consequently, we can control α by first choosing H0 to include an equality of the parameter, and then showing that the largest the Type I error can be is at this point of equality. Therefore we may as well choose the parameter to be the size. To illustrate in the average annual income example above α=P(rejecting H0|μ=35,000)P(rejecting H0|μ35,000). Therefore H0:μ35,000 is often just written as H0:μ=35,000.
  • Because H0 describes an equality, H1 is therefore a composite hypothesis. Therefore β=P(Type II error) is a function of the parameter within the alternative parameter space.

Power of a Test
The power of the test is

1β=1P(Type II error)=P(Reject H0|H1 true).

The power of a test can be thought of as the probability of making a correct decision.

19.3 Tests for normal means, σ known

In this section we study a number of standard hypothesis tests that one might perform on a random sample.

We assume throughout this section that x1,x2,,xn are i.i.d. samples from X with E[X]=μ, where μ is unknown and var(X)=σ2 is known.

Test 1: H0:μ=μ0 vs. H1:μ<μ0; σ2 known.

Watch Video 28 for the construction Hypothesis Test 1.

Video 28: Hypothesis Test 1

A summary of the construction of Hypothesis Test 1 is given below.

Data assumptions. We assume either

  • X1,X2,,Xn are a random sample from a normal distribution with known variance σ2;
  • The sample size n is sufficiently large so that we can assume ˉX is approximately normally distributed by the Central Limit Theorem, and that either the variance is known or that the sample variance s2σ2.

Step 1: Choose a test statistic based upon the random sample for the parameter we want to base our claim on. For example, we are interested in μ so we want to choose a good estimator of μ as our test statistic. That is, ˆμ=ˉX.

Step 2: Specify a decision rule. The smaller ˉX is, the more the evidence points towards the alternative hypothesis μ<μ0. Therefore our decision rule is to reject H0 if ˉX<c, where c is called the cut-off value for the test.

Step 3: Based upon the sampling distribution of the test statistic and the specified significance level of the test, solve for the specific value of the cut-off value c. To find c,
α=P(Type I error)=P(Reject H0|H0 true)=P(ˉX<c|μ=μ0)=P(ˉX<c|ˉXN(μ0,σ2n))=P(ˉXμ0σ/n<cμ0σ/n)=P(Z<cμ0σ/n).

Since P(Z<zα)=α, where zα can be found using qnorm(1-alpha) (P(Z<zα)=1α) or statistical tables, then zα=cμ0σ/n and c=μ0zασn.

So, the decision rule is to reject H0 if ˉX<μ0zασn or, equivalently, Z=ˉXμ0σn<zα.

Test 2: H0:μ=μ0 vs. H1:μ<μ0; σ2 known.

This is similar to the previous test, except the decision rule is to reject H0 if ˉX>μ0+zασn or, equivalently, Z=ˉXμ0σ/n>zα.

Note that both these tests are called one-sided tests, since the rejection region falls on only one side of the outcome space.

Test 3: H0:μ=μ0 vs. H1:μμ0; σ2 known.

The test statistic ˉX does not change but the decision rule will. The decision rule is to reject H0 if ˉX is sufficiently far (above or below) from μ0. Specifically, reject H0 if ˉX<μ0zα/2σn or ˉX>μ0+zα/2σn. Equivalent to both of these is |Z|=|ˉXμ0σ/n|>zα/2.

This is called a two-sided test because the decision rule partitions the outcome space into two disjoint intervals.

Coffee machine.
Suppose that a coffee machine is designed to dispense 6 ounces of coffee per cup with a standard deviation σ=0.2, where we assume the amount of coffee dispensed is normally distributed. A random sample of n=20 cups gives ˉx=5.94.Test whether the machine is correctly filling the cups.

We test H0:μ=6.0 vs. H1:μ6.0 at significance level α=0.05.

Using a two-sided test with known variance, the decision rule is to reject H0 if |Z|=|ˉx6.00.2/20|>z0.05/2=z0.025=1.96. Now
|Z|=|5.946.00.2/20|=|1.34|<1.96.

Therefore, we conclude that there is not enough statistical evidence to reject H0 at α=0.05.


19.4 p values

When our sample information determines a particular conclusion to our hypothesis test, we only report that we either reject or do not reject H0 at a particular significance level α. Hence when we report our conclusion the reader doesn’t know how sensitive our decision is to the choice of α.

To illustrate, in Example 19.3.1 (Coffee Machine) we would have reached the same conclusion that there is not enough statistical evidence to reject H0 at α=0.05 if |Z|=1.95 rather than |Z|=1.34. Whereas, if the significance level was α=0.10, we would have rejected H0 if |Z|=1.95>z0.10/2=1.6449, but we would not reject H0 if |Z|=1.34<z0.10/2=1.6449.

Note that the choice of α should be made before the test is performed; otherwise, we run the risk of inducing experimenter bias!

p-value
The p-value of a test is the probability of obtaining a test statistic at least as extreme as the observed data, given H0 is true.

So the p-value is the probability of rejecting H0 with the value of the test statistic obtained from the data given H0 is true. That is, it is the critical value of α with regards to the hypothesis test decision.

If we report the conclusion of the test, as well as the p value then the reader can decide how sensitive our result was to our choice of α.

Coffee machine (continued).
Compute the p value for the test in Example 19.3.1.

In Example 19.3.1 (Coffee machine), we were given ˉx=5.94, n=20 and σ=0.2. Our decision rule was to reject H0 if |Z|=|ˉx6.00.2/20|>z0.025.

To compute the p-value for the test assume H0 is true, that is, μ=6.0. We want to find,
P(|ˉxμ|>|5.94μ|)=P(|Z|=|ˉx6.00.2/20|>|5.946.00.2/20|)=P(|Z|>1.34)=2P(Z>1.34)=2×0.0901=0.1802.

Consider the following remarks on Example 19.4.2.

  • The multiplication factor of 2 has arisen since we are computing the p value for a two-sided test, so there is an equal-sized rejection region at both tails of the distribution. For a one-tailed test we only need to compute the probability of rejecting in one direction.
  • The p value implies that if we had chosen an α of at least 0.1802 then we would have been able to reject H0.
  • In applied statistics, the p value is interpreted as the sample providing:
    {strong evidence against H0,if p0.01,evidence against H0,if p0.05,slight evidence against H0,if p0.10,no evidence against H0,if p>0.10.

19.5 Tests for normal means, σ unknown

Assume X1,X2,,Xn is a random sample from a normal distribution with unknown variance σ2.

Test 4: H0:μ=μ0 vs. H1:μ<μ0; σ2 unknown.

As before the decision rule is to reject H0 if ˉX<c for some cut off value c that we need to find. We have
α=P(Type I error)=P(Reject H0|H0 true)=P(ˉX<c|μ=μ0)=P(ˉX<c|ˉXN(μ0,σ2n)).
However, now σ2 is unknown. We have seen before that
ˉXμ0s/ntn1,

where s2 is the sample variance.

Hence,
α=P(ˉX<c|ˉXN(μ0,σ2n))=P(ˉXμ0s/n<cμ0s/n)=P(T<cμ0s/n).
Now, P(T<tn1,α)=α, where tn1,α can be found by using the qt function in R with tn1,α= qt(alpha,n-1) or using statistical tables similar to those of the normal tables in Section 5.7. Therefore
tn1,α=cμ0s/n
and c=μ0tn1,αsn. Therefore, the decision rule is to reject H0 if ˉX<μ0tn1,αsn or, equivalently if
T=ˉXμ0s/n<tn1,α.

Test 5: H0:μ=μ0 vs. H1:μ>μ0; σ2 unknown.

This is similar to Test 4, except the decision rule is to reject H0 if ˉX>μ0+tn1,αsn or, equivalently if
T=ˉXμ0s/n>tn1,α.

Test 6: H0:μ=μ0 vs. H1:μμ0; σ2 unknown.

Similarly deduced to Test 3, the decision rule here is to reject H0 if
|T|=|ˉXμ0s/n|>tn1,α/2.

Coffee machine (continued).
Suppose that σ is unknown in Example 19.3.1, though we still assume the amount of coffee dispensed is normally distributed. A random sample of n=20 cups gives mean ˉx=5.94 and sample standard deviation s=0.1501.
Test whether the machine is correctly filling the cups.

We test H0:μ=6.0 vs. H1:μ6.0 at significance level α=0.05.

The decision rule is to reject H0 if |T|=|ˉx6.00.1501/20|>t201,0.05/2=t19,0.025=2.093.

Now
|T|=|5.946.00.1501/20|=|1.7876|<2.093.
Therefore, we do not reject H0 at α=0.05. The p value is
p=P(|ˉx6.0|>|5.946.0|)=2P(t19>|1.7876|)=2×0.0449=0.0898.


19.6 Confidence intervals and two-sided tests

Consider the two-sided t-test of size α. We reject H0 if |T|=|ˉXμ0s/n|>tn1,α/2. This implies we do not reject H0 if
|T|=|ˉXμ0s/n|tn1,α/2
or equivalently,
tn1,α/2snˉXμ0tn1,α/2snˉXtn1,α/2snμ0ˉX+tn1,α/2sn.
But
(ˉXtn1,α/2sn,ˉX+tn1,α/2sn)

is a 100(1α)% confidence interval for μ. Consequently, if μ0, the value of μ under H0, falls within the 100(1α)% confidence interval for μ, then we will not reject H0 at significance level α.

In general, therefore, there is a correspondence between the “acceptance region” of a statistical test of size α and the related 100(1α)% confidence interval. Therefore, we will not reject H0:θ=θ0 vs. H1:θθ0 at level α if and only if θ0 lies within the 100(1α)% confidence interval for θ.

Coffee machine (continued).
For the coffee machine in Example 19.5.1 (Coffee machine - continued) we wanted to test H0:μ=6.0 vs. H1:μ6.0 at significance level α=0.05. We were given a random sample of n=20 cups with ˉx=5.94 and s2=0.15012.
Construct a 95% confidence interval for μ.

The limits of a 95% confidence interval for μ are

ˉx±tn1,α/2sn=5.94±t201,0.05/20.150120=5.94±2.0930.150120

so the 95% confidence interval for μ is

(5.8698,6.0102).

If we use the confidence interval to perform our test, we see that

μ0=6.0(5.8698,6.0102),

so we will not reject H0 at α=0.05.


19.7 Distribution of the variance

Thus far we have considered hypothesis testing for the mean but we can also perform hypothesis tests for the variance of a normal distribution. However, first we need to consider the distribution of the sample variance.

Suppose that Z1,Z2,,ZnN(0,1). Then we have shown that
Z21χ21=Gamma(12,12),

in Section 14.2.

This can be extended to show that
ni=1Z2iχ2n=Gamma(n2,12).
More generally, if X1,X2,,XnN(μ,σ2)=μ+σZ, then
1σ2ni=1(XiˉX)2χ2n1.

Note that the degrees of freedom of χ2 is n1, the number of observations n minus 1 for the estimation of μ by ˉX.

It follows that
(n1)s2σ2χ2n1.

19.8 Other types of tests

Test 7: H0:σ21=σ22 vs. H1:σ21σ22.

Let X1,X2,,XmN(μ1,σ21) and Y1,Y2,,YnN(μ2,σ22) be two independent random samples from normal populations.

The test statistic is F=s21s22, where
s21=1m1mi=1(XiˉX)2,and s22=1n1ni1(YiˉY)2.
Recall that
(m1)s21σ21χ2m1,and (n1)s22σ22χ2n1.
Since the samples are independent, s21 and s22 are independent. Therefore,
s21/σ21s22/σ22χ2m1/(m1)χ2n1/(n1)Fm1,n1.
Under H0:σ21=σ22, it follows
F=s21s22Fm1,n1.
The decision rule is to reject H0 if
F=s21s22<Fm1,n1,α/2,or F=s21s22>Fm1,n1,1α/2.
The critical values Fm1,n1,α/2 and Fm1,n1,1α/2 are given using qf(alpha/2,m-1,n-1) and qf(1-alpha/2,m-1,n-1). Alternatively, Statistical Tables can be used. For the latter you may need to use the identity
Fν1,ν2,q=1Fν2,ν1,1q,

to obtain the required values from the table.

Test 8: H0:μ1=μ2 vs. H1:μ1μ2; σ2 unknown.

Assume X1,X2,,XmN(μ1,σ2) and Y1,Y2,,YnN(μ2,σ2) are two independent random samples with unknown but equal variance σ2.

Note that

  • (ˉXˉY)N((μ1μ2),σ2(1m+1n)) which implies
    (ˉXˉY)(μ1μ2)σ2(1m+1n)N(0,1);
  • (m+n2)s2pσ2χ2m+n2;
  • s2p is independent of ˉXˉY.
Therefore,
(ˉXˉY)(μ1μ2)s2p(1m+1n)=(ˉXˉY)(μ1μ2)σ2(1m+1n)(m+n2)s2p(m+n2)σ2tm+n2.
Under H0, μ1μ2=0, this becomes
T=ˉXˉYs2p(1m+1n)tm+n2.
Therefore the decision rule is to reject H0 if
|T|=|ˉXˉYs2p(1m+1n)|>tm+n2,α/2,

where s2p=(m1)s2X+(n1)s2Ym+n2 is the pooled sample variance.

Blood bank.
Suppose that one wants to test whether the time it takes to get from a blood bank to a hospital via two different routes is the same on average. Independent random samples are selected from each of the different routes and we obtain the following information:

Route X m=10 ˉx=34 s2X=17.111
Route Y n=12 ˉy=30 s2Y=9.454
Routes from blood bank to hospital.

Figure 19.1: Routes from blood bank to hospital.

Test H0:μX=μY vs. H1:μXμY at significance level α=0.05, where μ1 and μ2 denote the mean travel times on routes X and Y, respectively.

Attempt Example 19.8.1: Blood bank and then watch Video 29 for the solutions.

Video 29: Blood bank

Solution to Example 19.8.1: Blood bank
To perform the t-test we need the variances to be equal, so we test H0:σ2X=σ2Y vs. H1:σ2Xσ2Y at significance level α=0.05. The decision rule is to reject H0 if
F=s2Xs2Y<Fm1,n1,α/2orF=s2Xs2Y>Fm1,n1,1α/2.

Compute

  • F=s2Xs2Y=17.1119.454=1.81;
  • F9,11,0.975=1F11,9,0.025=13.915=0.256;
  • F9,11,0.025=3.588.

Hence F9,11,0.975<F<F9,11,0.025, so we do not reject H0 at α=0.05. Therefore we can assume the variances from the two samples are the same.

Now we test H0:μX=μY vs. H1:μXμY at significance level α=0.05

The decision rule is to reject H0 if

|T|=|ˉXˉYs2p(1m+1n)|>tm+n2,α/2.
Computing, the pooled variance,
s2p=9×17.111+11×9.45410+122=12.9
giving
T=|343012.9(110+112)|=2.601>t20,0.025=2.086.
Therefore we reject H0 that the journey times are the same on average at α=0.05. The p value is
P(|T|>2.601)=2P(T>2.601)=2×0.00854=0.01708.


Test 9: H0:μ1=μ2 vs. H1:μ1μ2; non-independent samples.

Suppose that we have two groups of observations X1,X2,,Xn and Y1,Y2,,Yn where there is an obvious pairing between the observations. For example consider before and after studies or comparing different measuring devices. This means the samples are no longer independent.

An equivalent hypothesis test to the one stated is H0:μd=μ1μ2=0 vs. H1:μd=μ1μ20. With this in mind define Di=XiYi for i=1,,n, and assume D1,D2,,DnN(μd,σ2d) and are i.i.d.

The decision rule is to reject H0 if
|ˉDsd/n|>tn1,α/2.

Drug Trial.
In a medical study of patients given a drug and a placebo, sixteen patients were paired up with members of each pair having a similar age and being the same sex. One of each pair received the drug and the other recieved the placebo. The response score for each patient was found.

Pair Number 1 2 3 4 5 6 7 8
Given Drug 0.16 0.97 1.57 0.55 0.62 1.12 0.68 1.69
Given Placebo 0.11 0.13 0.77 1.19 0.46 0.41 0.40 1.28

Are the responses for the drug and placebo significantly different?

This is a “matched-pair” problem, since we expect a relation between the values of each pair. The difference within each pair is

Pair Number 1 2 3 4 5 6 7 8
Di=yixi 0.05 0.84 0.80 0.64 0.16 0.71 0.28 0.41

We consider the Di’s to be a random sample from N(μD,σ2D). We can calculate that ˉD=0.326, s2D=0.24 so sD=0.49.

To test H0:μD=0 vs H1:μD0, the decision rule is to reject H0 if

|ˉDsD/n|=1.882>tn1,α/2.

Now t7,0.05=1.895, so we would not reject H0 at the 10% level (just).

19.9 Sample size calculation

We have noted that for a given sample x1,x2,,xn, if we decrease the Type I error α then we increase the Type II error β, and visa-versa.

To control for both Type I and Type II error, ensure that α and β are both sufficiently small, we need to choose an appropriate sample size n.

Sample size calculations are appropriate when we have two simple hypotheses to compare. For example, we have a random variable X with unknown mean μ=E[X] and known variance σ2=Var(X). We compare the hypotheses:

  • H0:μ=μ0,
  • H1:μ=μ1.

Without loss of generality we will assume that μ0<μ1.

Suppose that x1,x2,,xn represent i.i.d. samples from X. Then by the central limit theorem
ˉX=1nni=1XiN(μ,σ2n).
We reject H0 at an α significance level if
ˉxμ0σ/n>zα.
That is, we reject the null hypothesis μ=μ0 in favour of the alternative hypothesis μ=μ1 if
ˉx>μ0+zασn.

Note that as n increases, the cut-off for rejecting H0 decreases towards μ0.

We now consider the choice of n to ensure that the Type II error is at most β, or equivalently, that the power of the test is at least 1β.

The Power of the test is:
Power=P(Reject H0|H0 is false).
Rewriting in terms of the test statistic and H0 is false (H1 is true):
Power=P(ˉXμ0σ/n>zα|μ=μ1)=1β.

Lemma 19.9.1 (Sample size calculation) gives the smallest sample size n to bound Type I and II errors by α and β in the case where the variance, σ2 is known.

Sample size calculation.
Suppose that X is a random variable with unknown mean μ and known variance σ2. The required sample size, n, to ensure significance level α and power 1β for comparing hypotheses:

  • H0:μ=μ0
  • H1:μ=μ1

is: n=(σμ1μ0(zαz1β))2.

The details of the proof of Lemma 19.9.1 (Sample size calculation) are provided but can be omitted.

Proof of Sample Size calculations.
Thus the Power of the test is:
Power=1β=P(ˉXμ1+μ1μ0σ/n>z1α|μ=μ1)=P(ˉXμ1σ/n+μ1μ0σ/n>z1α|μ=μ1)=P(ˉXμ1σ/n>z1αμ1μ0σ/n|μ=μ1).
Given μ=μ1, we have that:
ˉXμ1σ/nZ=N(0,1).
Therefore the power satisfies:
P(Z>zαμ1μ0σ/n|μ=μ1)=1β=P(Z>z1β).
Hence,
zαμ1μ0σ/n=z1β,
which rearranges to give
n=(σμ1μ0(zαz1β))2.


Note:

  1. We need larger n as σ increases. (More variability in the observations.)
  2. We need larger n as μ1μ0 gets closer to 0. (Harder to detect a small difference in mean.)
  3. We have that α,β<0.5, so zα>0 and z1β<0. Hence, zαz1β becomes larger as α and β decrease. (Smaller errors requires larger n.)

The following R Shiny App lets you explore the effect of μ1μ0, σ and α on the sample size n or power 1β.

R Shiny app: Sample size calculation app

Task: Session 10

Attempt the R Markdown file for Session 10:
Session 10: Confidence intervals and hypothesis testing

Student Exercises

Attempt the exercises below.

Note that throughout the exercises, for a random variable X and 0<β<1, cβ satisfies P(X>cβ)=β.


Eleven bags of sugar, each nominally containing 1 kg, were randomly selected from a large batch. The weights of sugar were:
1.02,1.05,1.08,1.03,1.00,1.06,1.08,1.01,1.04,1.07,1.00.

You may assume these values are from a normal distribution.

  1. Calculate a 95% confidence interval for the mean weight for the batch.
  2. Test the hypothesis H0:μ=1 vs H1:μ1. Give your answer in terms of a p-value.

Note that

β0.10.050.0250.010.0050.0010.00050.0001t10:cβ1.37221.81252.22812.76383.16934.14374.58695.6938Z:cβ1.28161.64491.962.32632.57583.09023.29053.719
Solution to Exercise 19.1.
The sample mean is ˉx=11.4411=1.04. The sample variance is,
s2=1n1[ix2inˉx2]=110[11.906811.8976]=0.00092.

Hence, the sample standard deviation is s=0.00092=0.03033.

  1. The 95% confidence interval for the mean is given by ˉx±tn1,0.025s/n. Now t10,0.025=2.2281. Hence the confidence interval is
    1.04±2.2281(0.0303311)=1.04±0.0204=(1.0196,1.0604)
  2. The population variance is unknown so we apply a t test with test statistic
    t=ˉxμ0s/n=1.0410.03033/11=4.3741
    and n1 degrees of freedom. The p value is P(|t|>4.3741). From the critical values given, P(t10>4.1437)=0.001 and P(t10>4.5869)=0.0005, so 0.0005<P(t10>4.3741)<0.001. Hence 0.001<p<0.002. Therefore, there is strong evidence that μ1.



Random samples of 13 and 11 chicks, respectively, were given from birth a protein supplement, either oil meal or meat meal. The weights of the chicks when six weeks old are recorded and the following sample statistics obtained:
Oil Meal Data:ˉx1=247.9s21=2925.8n1=13Meat Meal Data:ˉx2=275.5s22=4087.3n2=11
  1. Carry out an F-test to examine whether or not the groups have significantly different variances or not.
  2. Calculate a 95% confidence interval for the difference between weights of 6-week-old chicks on the two diet supplements.
  3. Do you consider that the supplements have a significantly different effect? Justify your answer.

Note that

F10,12:
β0.050.0250.010.005cβ2.75343.37364.29615.0855
F12,10:
β0.050.0250.010.005cβ2.9133.62094.70595.6613.
β0.10.050.0250.010.0050.0010.00050.0001t22:cβ1.32121.71712.07392.50832.81883.5053.79214.452
Solution to Exercise 19.2.

We regard the data as being from two independent normal distributions with unknown variances.

  1. F-test: H0:σ21=σ22 vs. H1:σ21σ22.
    We reject H0 if
    F=s21s22>Fn11,n21,1α/2=1Fn21,n11,α/2 or F=s21s22<Fn11,n21,α/2.
    Now, F12,10,0.025=3.6209 and F12,10,0.975=13.3736=0.2964. From the data F=s21/s22=0.7158 so we do not reject H0. There is no evidence against equal population variances.
  2. Assume σ21=σ22=σ2 (unknown). The pooled estimate of the common variance σ2 is
    s2p=(n11)s21+(n21)s22n1+n22=3453.75,
    so sp=58.77. The 95% confidence limits for μ1μ2 are
    ˉx1ˉx2±t22,0.025sp1n1+1n2=(247.9275.5)±(2.0739×0.4097×58.77)=27.6±49.9355.
    So the interval is (77.5355,22.3355).
  3. Since the confidence interval in (b) includes zero (where μ1μ2=0, μ1=μ2) we conclude that the diet supplements do not have a significantly different effect (at 5% level).



A random sample of 12 car drivers took part in an experiment to find out if alcohol increases the average reaction time. Each driver’s reaction time was measured in a laboratory before and after drinking a specified amount of alcoholic beverage. The reaction times were as follows:

123456789101112Before0.680.640.820.80.720.550.840.780.570.730.860.74After0.730.620.920.870.770.70.880.780.660.790.860.72

Let μB and μA be the population mean reaction time, before and after drinking alcohol.

  1. Test H0:μB=μA vs. H1:μBμA assuming the two samples are independent.
  2. Test H0:μB=μA vs. H1:μBμA assuming the two samples contain `matched pairs’.
  3. Which of the tests in (a) and (b) is more appropriate for these data, and why?

Note that

β0.10.050.0250.010.0050.0010.00050.0001t11:cβ1.36341.79592.2012.71813.10584.02474.4375.4528

and the critical values for t22 are given above in Exercise 19.2.

Solution to Exercise 19.3.
  1. The summary statistics of the reaction times before alcohol are ˉx=0.7275 and s2x=0.0103. Similarly the summary statistics after alcohol are ˉy=0.775 and s2y=0.0088. Assuming both samples are from normal distributions with the same variance, the pooled variance estimator is
    s2p=(n1)s2x+(n1)s2y2(n1)=0.0096.
    The null hypothesis is rejected at α=0.05 if
    t=|ˉxˉysp1n+1n|>t22,0.025=2.0739.
    From the data,
    t=|0.72750.7750.098212|=1.1873
    Hence, the null hypothesis is not rejected. There is no significant difference between the reaction times.
  2. The difference in reaction time for each driver is
    afterbefore=(0.05,0.02,0.10,0.07,0.05,0.15,0.04,0.00,0.09,0.06,0.00,0.02)
    The sample mean and variance of the differences are ˉd=0.0475 and sd=0.0517. Assuming the differences are samples from a normal distribution, the null hypothesis is rejected at α=0.05 if
    t=|ˉdsd/11|>t11,0.025=2.201.
    From the data,
    t=|0.04750.0517/11|=3.1827.
    Hence, the null hypothesis is rejected. There is a significant difference between the reaction times.
  3. The matched pair test in (b) is more appropriate. By recording each driver’s reaction time before and after, and looking at the difference for each driver we are removing the driver effect. The driver effect says that some people are naturally slow both before and after alcohol, others are naturally quick. By working with the difference we have removed this factor.