Chapter 20 Hypothesis Testing Discrete Data

20.1 Introduction

In Section 19 (Hypothesis Testing) we have studied hypothesis testing for normal random variables and through the central limit theorem sums (and means) of random variables. The normal distribution is a continuous distribution and there are many situations where we want to compare hypotheses with data or distributions which are discrete. These include:-

20.2 Goodness-of-fit motivating example

We start with a motivating example.

Film stars.
A film studio wants to decide which actor or actress to hire for the main role in a series of movies.

They have a shortlist of 5 and decide to ask the public who their favourite actor or actress is.

1,000 people are randomly selected and asked who their favourite actor or actress is from the shortlist of 5.

Results:

Preferred Actor 1 2 3 4 5
Frequency 225 189 201 214 171

An investor in the film claims “There is no difference in who the public prefer we should hire the cheapest!”

Does the data support the investor’s claim?

We work through testing the investor’s claim via a series of steps.

Step 1
Interpret what the investor’s claim represents statistically.

“No difference in who the public prefers” means that if we choose an individual at random from the population they are equally likely to choose each of the five actors/actresses. That is, probability 1/5 of each actor/actress being chosen.

Step 2
What would we expect to observe in the data if the investor’s claim is true?

The investor’s claim has led us to a model where each actor/actress has probability 1/5 of being selected by a member of the public. Therefore when 1000 people are asked, we would expect each actor/actress to receive:
1000×15=200 votes.
Thus based on the model of the investor’s claim:
Preferred Actor 1 2 3 4 5
Frequency 225 189 201 214 171
Expected 200 200 200 200 200

Step 3
Is what we observe (Frequency) in the data consistent with what we expect (Expected) to see if the investor’s claim is a good model?

In hypothesis testing language, should we reject or not the null hypothesis:

H0: All actors equally popular.

In favour of the alternative hypothesis:

H1: There is a difference in popularity between at least two actors.

To compare competing hypotheses we require a test statistic and a sampling distribution for the test statistic under the assumption that the null hypothesis is true.

Test Statistic

For each outcome (actor), let Oi and Ei denote the number of observed and expected votes for actor i.

The test statistic χobs2 is
χobs2=i(OiEi)2Ei.
For the actors example,
χobs2=(225200)2200+(189200)2200++(171200)2200=8.92.

Sampling distribution

We reject H0 at a significance level α if
χobs2χν,α2,
where χν,α2 is the (1α)100% quantile of the χ2 distribution with ν degrees of freedom and
ν=Number of categories1=51=4.
Thus if X is a χ2 distribution with ν degrees of freedom then
P(Xχν,α2)=1α
or equivalently,
P(X>χν,α2)=α.

Since χ4,0.052=9.488, we do not reject the null hypothesis at a 5% significance level. That is, the investor’s claim of all actors being equally popular is reasonable given the observed data.

20.3 Goodness-of-fit

We describe the general procedure for testing the goodness-of-fit of a probability distribution to data using the χ-squared distribution.

Suppose that we have N independent observations, y1,y2,,yN from an unknown probability distribution, Y. Suppose that there are n categories covering the possible outcomes and for i=1,2,,n, let Ci denote category i. For example, we could have Ci={y=i}, the observations equal to i, or Ci={ai<ybi}, the observations equal in the range (ai,bi].

For i=1,2,,n, let
Oi=#{yjCi},

the number of data points observed in category i.

We propose a probability distribution X for the unknown probability distribution, Y. This gives us our null hypothesis:

H0: Y=X

with the alternative hypothesis

H1: YX.

Under the null hypothesis, we calculate for each category i, the expected number of observations we would expect to belong to category i. That is, for i=1,2,,n,
Ei=N×P(XCi).
We compute the test statistic χobs2 is
χobs2=i(OiEi)2Ei,

and the number of degrees of freedom, ν=n1.

We choose a significance level α and reject the null hypothesis at the significance level if
χobs2>χν,1α2.

Important points

  1. The test statistic, under the null hypothesis, does not exactly follow a χ2 distribution. As with the central limit theorem, the test statistic is approximately χ2 distributed with the approximation becoming better as the amount of data in each category increases.
  2. For discrete data it will often be natural to choose Ci={y=i}, whereas for continuous data we have considerable flexibility in choosing the number of categories and the category intervals. The considerations on choice of categories for goodness-of-fit testing are not dissimilar to the considerations on choice of bins for histograms.
  3. The expected frequencies in each category should not be too small with a rule of thumb that Ei5. If some of the expected frequencies are less than 5 then we pool categories such that the expected frequency of the two (or more) categories combined is greater than or equal to 5.
  4. We will often want to fit a probability distribution X from a given family of probability distributions (e.g. Poisson, Gamma) without necessarily a priori choosing the parameters of the distribution. For example, we might choose to fit a Poisson distribution with mean λ to a data set and use the sample mean, y¯, as the choice of λ. The goodness-of-fit procedure is as above except that we reduce the number of degrees of freedom by 1 for each parameter we estimate from the data,
    ν=#Categories1#Estimated Parameters.

Alleles.
Each person is one of the following genotypes A/A, A/S or S/S.

The observed frequencies in a population of N=886 are:
A/A:700,A/S:180,S/S:6
Hypothesis:
The proportion of people with each genotype is
p2,2p(1p) and (1p)2,

where p is the proportion of alleles that are of type A.

Is this a reasonable model for the data?

Watch Video 30 for the worked solutions to Example 20.3.1 (Alleles)

Video 30: Alleles

Solution to Example 20.3.1: Alleles.

We start with finding a suitable choice for p.

We can estimate by p by p^ the proportion of alleles of type A in the population:
p^=2×700+1802×886=0.8916.

This is the MLE for p.

Therefore the probabilities for each genotype are:
P(A/A)=p2=0.89164792=0.795P(A/S)=2p(1p)=2×0.8916479×(10.8916479)=0.1932P(S/S)=(1p)2=(10.8916479)2=0.0117.
Multiply the probabilities by N=886 to give the expected numbers for each genotype:
A/A:NP(A/A)=886×0.795=704.4A/S:NP(A/S)=886×0.1932=171.2S/S:NP(S/S)=886×0.0117=10.4.
The test statistics is
χobs2=i(OiEi)2Ei=(700704.4)2704.4+(180171.2)2171.2+(610.4)210.4=0.0275+0.4523+1.8615=2.3413.
Since we have n=3 categories and estimated 1 parameter (p), we have that the degrees of freedom is:
ν=311=1.

At 0.05% significance level: χ1,0.052=3.8415.

Since, χobs2<χ1,0.052, there is no evidence to reject the null hypothesis.

The p-value is 0.126 (=P(W>χobs2)), where W is a χ-square distribution with ν=1.

20.4 Testing Independence

Suppose that we have two categorical variables, A and B, where A can take mA possible values and B can take mB possible values.

Suppose that we have N observations with each observation belonging to one of the mA categories of variable A and one of the mB categories of variable B. For i=1,2,,mA and j=1,2,,mB, let Oij denote the number of observations which belong to category i of variable A and category j of variable B.

For example, variable A could be hair colour with categories:
1 - Brown
2 - Black
3 - Blonde
and variable B could be eye colour with categories:
1 - Brown
2 - Blue
3 - Green

Then N will be the total number of observations and O32 will be the number of observations (people) with Blonde hair and Blue eyes.

We often want to test the null hypothesis that the variables A and B are independent. For example, in the above scenario, the hypothesis that hair colour and eye colour are independent.

What does independence look like?

Let pi denote the probability that an individual in the population will belong to category i of variable A and let pj denote the probability that an individual in the population will belong to category j of variable B. Then if variables A and B are independent, the probability of individual belonging both to category i of variable A and category j of variable B is
pi×pj.
Let
Ni=j=1mBOij
denote the total number of observations with variable A in category i and similarly let
Nj=i=1mAOij

denote the total number of observations with variable B in category j.

We can estimate pi by
p^i=NiN
and pj by
p^j=NjN.
This will give an estimate of
p^i×p^j=NiN×NjN=NiNjN2

for the probability of an individual belonging both to category i of variable A and category j of variable B under the null hypothesis of independence between variables A and B.

Therefore under the null hypothesis of independence the expected number of observations belonging to category i of variable A and category j of variable B is
Eij=N×p^i×p^j=NiNjN.
The test statistic χobs2 is again the sum of the square of the difference between the observed, Oij, and the expected, Eij, values divided by the expected values. That is,
χobs2=i=1mAj=1mB(OijEij)2Eij.
The number of degrees of freedom is
ν=(mA1)(mB1).
We reject the null hypothesis of independence between the variables A and B in favour of the alternative hypothesis that the variables A and B are dependent at a significance level α, if
χobs2>χν,α2.
School children.

A school take part in a study which involves recording the eye colour and hair colour of each child.
ObservedEye BrownBlueGreenBrown1171421HairBlack56311Blonde174119
The hypothesis which we wish to test is:
Are eye and hair colour independent?


The first step is to compute the row and column totals which give the total number of individuals with each hair colour and each eye colour, respectively.

EyeBrownBlueGreenTotalBrown1171421152HairBlack5631170Blonde17411970Total1905851299.

Then using Eij=NiNj/N, we can compute the expected number of individuals in each category under the assumption of independence.

For example, the expected number of people with brown hair and brown eyes is
E11=NiNjN=152×190299=96.6.
Therefore
ExpectedEyeBrownBlueGreenTotalBrown96.629.525.9152HairBlack44.513.611.970Blonde48.914.913.277Total1905851299.
We can the compute the differences between the observed and expected values. For example, for brown hair (hair category 1) and blue eyes (eye category 2), we have that:
(O12E12)2E12=(1429.5)229.5=8.14.
Therefore
(OE)2EEyeBrownBlueGreenBrown4.318.140.93HairBlack2.978.262.55Blonde20.8145.722.55,
giving the test statistic to be
χobs2=i=13j=13(OijEij)2Eij=93.76.
Under the null hypothesis (independence), the test statistic approximately follow a χ2 distribution with
ν=(mA1)(mB1)=(31)×(31)=4

degrees of freedom.

Given that for a 0.1 significance level (α=0.001), the critical value for the χ2 distribution is χ4,0.0012=18.467, there is very strong evidence to reject the null hypothesis. That is, there is very strong evidence that hair colour and eye colour are dependent.

Task: Session 11

Attempt the R Markdown file for Session 11:
Session 11: Goodness-of-fit

Student Exercises

Attempt the exercises below.


The following data give the frequency distribution of the size of casual groups of people on a spring afternoon in a park.
Size of Group123456Frequency148669419537101
A suggested model for the probability pr of a group of size r is
pr=μrexp(μ)r![1exp(μ)],r=1,2,,

where μ is estimated to be 0.89 for this data set.

Does this give a good fit to the data?

Solution to Exercise 20.1.
The total number of groups is
1486+694+195+37+10+1=2423,
and the fitted model is
pr=μrexp(μ)r![1exp(μ)]
with μ=0.89, and so the expected frequenct of a group of size r is 2423pr=1688.3(0.89)rr!. The expected frequencies are:
Size of Group123456Observed Frequency148669419537101Expected Frequency1502.6668.7198.444.17.91.3
Note that we have made the last group “6”. To ensure no expected frequencies less than 5, we combine groups “5” and “6” to make the group “5” with expected frequency 9.2 and observed frequency 11.
There are now 5 groups and the degrees of freedom for the test is
ν=#Groups1#Parameters=511=3.
The test statistic is
χobs2=(14861502.6)21502.6+(694668.7)2668.7++(119.2)29.2=2.712.

Test H0: Probability model is a good fit, with α=0.05. We reject H0 if χobs2χ3,0.052=7.815. Therefore we do not reject H0 at the 5% significance level.

NB. The probability model
pr=μrexp(μ)r![1exp(μ)],r=1,2,,

is known as the zero-truncated Poisson distribution.



In order to test the lifetime of small batteries used to power clocks, 40 batteries were chosen at random and tested. Their times (in months) in failure were
1811253640723351112462887752411231345240791459173954163825220967263138
The manufacturer claims that the lifetimes, X, have an exponential distribution with mean 30 months. If we assume this, calculate a, b, c and d such that
15=P(0<X<a)=P(a<X<b)=P(b<X<c)=P(c<X<d)=P(d<X<).

Construct a table of expected and observed frequencies for the above five intervals and hence test the manufacturer’s claim by using a goodness-of-fit test at the 5% level.

Solution to Exercise 20.2.
Need to solve F(x)=1exp(λx)=p, where λ=1/30 for p taking the values 15,25,35,45. Hence
x=1λlog(1p)=30log(1p),
so a=6.7, b=15.3, c=27.5 and d=48.3.
Observed frequencies are:
06.76.715.315.327.527.548.348.3+697108.
The expected number in each cell is 40×15=8. The test statistic is:
χobs2=(68)28+(98)28++(88)28=1.25.

The degrees of freedom is ν=51=4.
Now χ4,0.052=9.488, so the test is not significant at a 5% significance level and we do not reject H0. No reason to suppose that the manufacturer’s claim is incorrect.



In a clinical trial to test the effect of a new drug for influenza, 164 people with the condition were split into two equal groups, one of which was given the drug, the other a placebo. The table below indicates the response of the treatments.
HelpedHarmedNo effectDrug501022Placebo421228

Test the hypothesis that the drug is no different from the placebo.

Solution to Exercise 20.3.
The marginal totals are:
HelpedHarmedNo effectTotalDrug50102282Placebo42122882Total922250164.
Therefore the expected frequencies are:
HelpedHarmedNo effectDrug461125Placebo461125
Hence (OijEij)2Eij for each cell is
HelpedHarmedNo effectDrug1646111925Placebo1646111925
The test statistic is
χobs2=2(1646+111+925)=1.5975

The degrees of freedom are (r1)(c1)=(21)(31)=2 and the critical value is χ2,0.052=5.991 at the 5% significance level. Therefore accept H0, that the drug is no different from placebo, i.e. there is no evidence that the response from the drug is different from the placebo.