Chapter 20 Hypothesis Testing Discrete Data

20.1 Introduction

In Section 19 (Hypothesis Testing) we have studied hypothesis testing for normal random variables and through the central limit theorem sums (and means) of random variables. The normal distribution is a continuous distribution and there are many situations where we want to compare hypotheses with data or distributions which are discrete. These include:-

20.2 Goodness-of-fit motivating example

We start with a motivating example.

Film stars.
A film studio wants to decide which actor or actress to hire for the main role in a series of movies.

They have a shortlist of 5 and decide to ask the public who their favourite actor or actress is.

1,000 people are randomly selected and asked who their favourite actor or actress is from the shortlist of 5.

Results:

Preferred Actor 1 2 3 4 5
Frequency 225 189 201 214 171

An investor in the film claims “There is no difference in who the public prefer we should hire the cheapest!”

Does the data support the investor’s claim?

We work through testing the investor’s claim via a series of steps.

Step 1
Interpret what the investor’s claim represents statistically.

“No difference in who the public prefers” means that if we choose an individual at random from the population they are equally likely to choose each of the five actors/actresses. That is, probability 1/5 of each actor/actress being chosen.

Step 2
What would we expect to observe in the data if the investor’s claim is true?

The investor’s claim has led us to a model where each actor/actress has probability 1/5 of being selected by a member of the public. Therefore when 1000 people are asked, we would expect each actor/actress to receive:
\[ 1000 \times \frac{1}{5} = 200 \mbox{ votes}. \]
Thus based on the model of the investor’s claim:
Preferred Actor 1 2 3 4 5
Frequency 225 189 201 214 171
Expected 200 200 200 200 200

Step 3
Is what we observe (Frequency) in the data consistent with what we expect (Expected) to see if the investor’s claim is a good model?

In hypothesis testing language, should we reject or not the null hypothesis:

\(H_0\): All actors equally popular.

In favour of the alternative hypothesis:

\(H_1\): There is a difference in popularity between at least two actors.

To compare competing hypotheses we require a test statistic and a sampling distribution for the test statistic under the assumption that the null hypothesis is true.

Test Statistic

For each outcome (actor), let \(O_i\) and \(E_i\) denote the number of observed and expected votes for actor \(i\).

The test statistic \(\chi_{obs}^2\) is
\[ \chi^2_{obs} = \sum_i \frac{(O_i - E_i)^2 }{E_i}. \]
For the actors example,
\[ \chi^2_{obs} = \frac{(225-200)^2}{200} +\frac{(189-200)^2}{200} + \ldots + \frac{(171-200)^2}{200} = 8.92. \]

Sampling distribution

We reject \(H_0\) at a \(\alpha \%\) significance level if
\[ \chi^2_{obs} \geq \chi^2_{\nu, 1-\alpha}, \]
where \(\chi^2_{\nu, 1-\alpha}\) is the \((1- \alpha) 100 \%\) quantile of the \(\chi^2\) distribution with \(\nu\) degrees of freedom and
\[ \nu = \mbox{Number of categories} - 1 =5-1=4. \]
Thus if \(X\) is a \(\chi^2\) distribution with \(\nu\) degrees of freedom then
\[ \mathrm{P} \left(X \leq \chi^2_{\nu, 1-\alpha} \right) = 1- \alpha \]
or equivalently,
\[ \mathrm{P} \left(X > \chi^2_{\nu, 1-\alpha} \right) = \alpha. \]

Since \(\chi^2_{4,0.95} = 9.488\), we do not reject the null hypothesis at a \(5 \%\) significance level. That is, the investor’s claim of all actors being equally popular is reasonable given the observed data.

20.3 Goodness-of-fit

We describe the general procedure for testing the goodness-of-fit of a probability distribution to data using the \(\chi\)-squared distribution.

Suppose that we have \(N\) independent observations, \(y_1, y_2, \ldots, y_N\) from an unknown probability distribution, \(Y\). Suppose that there are \(n\) categories covering the possible outcomes and for \(i=1,2,\ldots, n\), let \(\mathcal{C}_i\) denote category \(i\). For example, we could have \(\mathcal{C}_i = \{ y = i\}\), the observations equal to \(i\), or \(\mathcal{C}_i = \{ a_i < y \leq b_i\}\), the observations equal in the range \((a_i, b_i]\).

For \(i=1,2,\ldots, n\), let
\[ O_i = \# \{y_j \in \mathcal{C}_i\}, \]

the number of data points observed in category \(i\).

We propose a probability distribution \(X\) for the unknown probability distribution, \(Y\). This gives us our null hypothesis:

\(H_0\): \(Y =X\)

with the alternative hypothesis

\(H_1\): \(Y \neq X\).

Under the null hypothesis, we calculate for each category \(i\), the expected number of observations we would expect to belong to category \(i\). That is, for \(i=1,2,\ldots,n\),
\[ E_i = N \times \mathrm{P} (X \in \mathcal{C}_i). \]
We compute the test statistic \(\chi_{obs}^2\) is
\[ \chi^2_{obs} = \sum_i \frac{(O_i - E_i)^2 }{E_i}, \]

and the number of degrees of freedom, \(\nu = n -1\).

We choose an \(\alpha \%\) significance level and reject the null hypothesis at an \(\alpha \%\) significance level if
\[ \chi^2_{obs} > \chi^2_{\nu, 1-\alpha}.\]

Important points

  1. The test statistic, under the null hypothesis, does not exactly follow a \(\chi^2\) distribution. As with the central limit theorem, the test statistic is approximately \(\chi^2\) distributed with the approximation becoming better as the amount of data in each category increases.
  2. For discrete data it will often be natural to choose \(\mathcal{C}_i = \{y = i\}\), whereas for continuous data we have considerable flexibility in choosing the number of categories and the category intervals. The considerations on choice of categories for goodness-of-fit testing are not dissimilar to the considerations on choice of bins for histograms.
  3. The expected frequencies in each category should not be too small with a rule of thumb that \(E_i \geq 5\). If some of the expected frequencies are less than 5 then we pool categories such that the expected frequency of the two (or more) categories combined is greater than or equal to 5.
  4. We will often want to fit a probability distribution \(X\) from a given family of probability distributions (e.g. Poisson, Gamma) without necessarily a priori choosing the parameters of the distribution. For example, we might choose to fit a Poisson distribution with mean \(\lambda\) to a data set and use the sample mean, \(\bar{y}\), as the choice of \(\lambda\). The goodness-of-fit procedure is as above except that we reduce the number of degrees of freedom by 1 for each parameter we estimate from the data,
    \[ \nu = \# \mbox{Categories} -1 - \# \mbox{Estimated Parameters}. \]

Alleles.
Each person is one of the following genotypes \(A/A\), \(A/S\) or \(S/S\).

The observed frequencies in a population of \(N=886\) are:
\[ A/A: 700, \hspace{1cm} A/S: 180, \hspace{1cm} S/S: 6 \]
Hypothesis:
The proportion of people with each genotype is
\[ p^2, \; 2 p (1-p) \mbox{ and } (1-p)^2, \]

where \(p\) is the proportion of alleles that are of type \(A\).

Is this a reasonable model for the data?

Watch Video 30 for the worked solutions to Example 2 (Alleles)

Video 30: Alleles

Alternatively worked solutions are provided:

Solution to Example 2: Alleles

We start with finding a suitable choice for \(p\).

We can estimate by \(p\) by \(\hat{p}\) the proportion of alleles of type \(A\) in the population:
\[\hat{p}= \frac{2 \times 700 + 180}{2 \times 886} = 0.8916. \]

This is the MLE for \(p\).

Therefore the probabilities for each genotype are:
\[\begin{eqnarray*} \mathrm{P} (A/A) &=&p^2 = 0.8916^2 = 0.795 \\ \mathrm{P} (A/S) &=& 2p(1-p) = 2\times 0.8916 \times (1-0.8916)= 0.1933 \\ \mathrm{P} (S/S) &=&(1-p)^2 = (1-0.8916)^2 = 0.0118. \end{eqnarray*}\]
Multiply the probabilities by \(N=886\) to give the expected numbers for each genotype:
\[\begin{eqnarray*} A/A: N \mathrm{P} (A/A) &=& 886 \times 0.795 = 704.37 \\ A/A: N \mathrm{P} (A/A) &=& 886 \times 0.1933 = 171.26 \\ A/A: N \mathrm{P} (A/A) &=& 886 \times 0.0118 = 10.45. \end{eqnarray*}\]
The test statistics is
\[\begin{eqnarray*} \chi^2_{obs} &=& \sum_i \frac{(O_i - E_i)^2}{E_i} \\ &=& \frac{(700-704.37)^2}{704.37} + \frac{(180-171.26)^2}{171.26} + \frac{(6-10.45)^2}{10.45} \\ &=& 0.0271 + 0.446 + 1.895 = 2.3681. \end{eqnarray*}\]
Since we have \(n=3\) categories and estimated 1 parameter \((p)\), we have that the degrees of freedom is:
\[ \nu = 3 - 1 -1 = 1. \]

At \(0.05\%\) significance level: \(\chi^2_{1,0.95} = 3.8415\).

Since, \(\chi^2_{obs} < \chi^2_{1,0.95}\), there is no evidence to reject the null hypothesis.

The \(p\)-value is 0.1238 (=\(\mathrm{P} (W > \chi^2_{obs})\)), where \(W\) is a \(\chi\)-square distribution with \(\nu =1\).

20.4 Testing Independence

Suppose that we two categorical variables, \(A\) and \(B\), where \(A\) can take \(m_A\) possible values and \(B\) can take \(m_B\) possible values.

Suppose that we have \(N\) observations with each observation belonging to one of the \(m_A\) categories of variable \(A\) and one of the \(m_B\) categories of variable \(B\). For \(i=1,2,\ldots, m_A\) and \(j=1,2,\ldots,m_B\), let \(O_{ij}\) denote the number of observations which belong to category \(i\) of variable \(A\) and category \(j\) of variable \(B\).

For example, variable \(A\) could be hair colour with categories:
1 - Brown
2 - Black
3 - Blonde
and variable \(B\) could be eye colour with categories:
1 - Brown
2 - Blue
3 - Green

Then \(N\) will be the total number of observations and \(O_{32}\) will be the number of observations (people) with Blonde hair and Blue eyes.

We often want to test the null hypothesis that the variables \(A\) and \(B\) are independent. For example, in the above scenario, the hypothesis that hair colour and eye colour are independent.

What does independence look like?

Let \(p_{i \cdot}\) denote the probability that an individual in the population will belong to category \(i\) of variable \(A\) and let \(p_{\cdot j}\) denote the probability that an individual in the population will belong to category \(j\) of variable \(B\). Then if variables \(A\) and \(B\) are independent, the probability of individual belonging both to category \(i\) of variable \(A\) and category \(j\) of variable \(B\) is
\[ p_{i \cdot} \times p_{\cdot j}. \]
Let
\[N_{i \cdot} = \sum_{j=1}^{m_B} O_{ij}\]
denote the total number of observations with variable \(A\) in category \(i\) and similarly let
\[N_{\cdot j} = \sum_{i=1}^{m_A} O_{ij}\]

denote the total number of observations with variable \(B\) in category \(j\).

We can estimate \(p_{i \cdot}\) by
\[ \hat{p}_{i \cdot} = \frac{N_{i \cdot}}{N} \]
and \(p_{\cdot j}\) by
\[ \hat{p}_{\cdot j} = \frac{N_{\cdot j}}{N}. \]
This will give an estimate of
\[ \hat{p}_{i \cdot} \times \hat{p}_{\cdot j} = \frac{N_{i \cdot}}{N} \times \frac{N_{\cdot j}}{N} = \frac{N_{i \cdot} N_{\cdot j}}{N^2} \]

for the probability of an individual belonging both to category \(i\) of variable \(A\) and category \(j\) of variable \(B\) under the null hypothesis of independence between variables \(A\) and \(B\).

Therefore under the null hypothesis of independence the expected number of observations belonging to category \(i\) of variable \(A\) and category \(j\) of variable \(B\) is
\[ E_{ij} = N \times\hat{p}_{i \cdot} \times \hat{p}_{\cdot j} = \frac{N_{i \cdot} N_{\cdot j}}{N}. \]
The test statistic \(\chi_{obs}^2\) is again the sum of the square of the difference between the observed, \(O_{ij}\), and the expected, \(E_{ij}\), values divided by the expected values. That is,
\[ \chi^2_{obs} = \sum_{i=1}^{m_A} \sum_{j=1}^{m_B} \frac{(O_{ij} - E_{ij})^2 }{E_{ij}}. \]
The number of degrees of freedom is
\[ \nu = (m_A -1) (m_B-1). \]
We reject the null hypothesis of independence between the variables \(A\) and \(B\) in favour of the alternative hypothesis that the variables \(A\) and \(B\) are dependent at a \(\alpha \%\) significance level, if
\[ \chi^2_{obs} > \chi^2_{\nu, 1-\alpha}.\]
School children.

A school take part in a study which involves recording the eye colour and hair colour of each child.
\[ \begin{array}{cc|ccc} \mbox{Observed} & & & \mbox{Eye } & \\ & & \mbox{Brown} & \mbox{Blue} & \mbox{Green} \\ \hline &\mbox{Brown} & 117 & 14 & 21 \\ \mbox{Hair} & \mbox{Black} & 56 & 3& 11 \\ & \mbox{Blonde} & 17 & 41 & 19 \end{array} \]
The hypothesis which we wish to test is:
Are eye and hair colour independent?


The first step is to compute the row and column totals which give the total number of individuals with each hair colour and each eye colour, respectively.

\[ \begin{array}{cc|ccc|c} & & & \mbox{Eye} & \\ & & \mbox{Brown} & \mbox{Blue} & \mbox{Green} & \mbox{Total}\\ \hline & \mbox{Brown} & 117 & 14 & 21& 152 \\ \mbox{Hair} & \mbox{Black} & 56 & 3& 11& 70 \\ & \mbox{Blonde} & 17 & 41 & 19 & 70 \\ \hline & \mbox{Total} & 190 & 58 & 51 & 299 \end{array}. \]

Then using \(E_{ij} = N_{i \cdot} N_{\cdot j}/N\), we can compute the expected number of individuals in each category under the assumption of independence.

For example, the expected number of people with brown hair and brown eyes is
\[ E_{11} = \frac{N_{i \cdot} N_{\cdot j}}{N} = \frac{152 \times 190}{299} = 96.6. \]
Therefore
\[ \begin{array}{cc|ccc|c} \mbox{Expected} & & & \mbox{Eye} & \\ & & \mbox{Brown} & \mbox{Blue} & \mbox{Green} & \mbox{Total}\\ \hline & \mbox{Brown} & 96.6& 29.5 & 25.9& 152 \\ \mbox{Hair} & \mbox{Black} & 44.5 & 13.6 & 11.9 & 70 \\ & \mbox{Blonde} & 48.9 & 14.9 & 13.2 & 77 \\ \hline & \mbox{Total} & 190 & 58 & 51 & 299 \end{array}. \]
We can the compute the differences between the observed and expected values. For example, for brown hair (hair category 1) and blue eyes (eye category 2), we have that:
\[ \frac{(O_{12} - E_{12})^2}{E_{12}} = \frac{(14-29.5)^2}{29.5} = 8.14. \]
Therefore
\[ \begin{array}{cc|ccc} \frac{(O -E)^2}{E} & & & \mbox{Eye} \\ & & \mbox{Brown} & \mbox{Blue} & \mbox{Green} \\ \hline & \mbox{Brown} & 4.31 & 8.14 & 0.93 \\ \mbox{Hair} & \mbox{Black} & 2.97 & 8.26 & 2.55 \\ & \mbox{Blonde} & 20.81 & 45.72 & 2.55 \end{array},\]
giving the test statistic to be
\[ \chi_{obs}^2 = \sum_{i=1}^3 \sum_{j=1}^3 \frac{(O_{ij} - E_{ij})^2}{E_{ij}}= 93.76. \]
Under the null hypothesis (independence), the test statistic approximately follow a \(\chi^2\) distribution with
\[\nu = (m_A -1) (m_B-1) = (3-1)\times (3-1) =4 \]

degrees of freedom.

Given that for a \(0.1%\) significance level \((\alpha=0.001)\), the critical value for the \(\chi^2\) distribution is \(\chi^2_{4, 0.999} =18.467\), there is very strong evidence to reject the null hypothesis. That is, there is very strong evidence that hair colour and eye colour are dependent.

Task: Lab 11

Attempt the R Markdown file for Lab 11:
Lab 11: Goodness-of-fit

Student Exercises

Attempt the exercises below.

Question 1.

The following data give the frequency distribution of the size of casual groups of people on a spring afternoon in a park.
\[ \begin{array}{l|cccccc} \mbox{Size of Group} & 1 & 2 & 3 & 4 & 5 & 6 \\ \hline \mbox{Frequency} & 1486 & 694 & 195 & 37 & 10 & 1 \end{array} \]
A suggested model for the probability \(p_r\) of a group of size \(r\) is
\[ p_r = \frac{\mu^r \exp(-\mu)}{r! [1- \exp(-\mu)]}, \hspace{1cm} r=1,2,\ldots, \]

where \(\mu\) is estimated to be 0.89 for this data set.

Does this give a good fit to the data?

Solution to Question 1.
The total number of groups is
\[ 1486+694+195+37+10+1 = 2423, \]
and the fitted model is
\[ p_r = \frac{\mu^r \exp(-\mu)}{r! [1- \exp(-\mu)]} \]
with \(\mu=0.89\), and so the expected frequenct of a group of size \(r\) is \(2423 p_r = 1688.3 \frac{(0.89)^r}{r!}\). The expected frequencies are:
\[ \begin{array}{l|cccccc} \mbox{Size of Group} & 1 & 2 & 3 & 4 & 5 & \geq 6 \\ \hline \mbox{Observed Frequency} & 1486 & 694 & 195 & 37 & 10 & 1 \\ \mbox{Expected Frequency} & 1502.6 & 668.7 & 198.4 & 44.1 & 7.9 & 1.3 \end{array} \]
Note that we have made the last group “\(\geq 6\)”. To ensure no expected frequencies less than 5, we combine groups “5” and “\(\geq 6\)” to make the group “\(\geq 5\)” with expected frequency 9.2 amd observed frequency 11.
There are now 5 groups and the degrees of freedom for the test is
\[ \nu = \# \mbox{Groups} -1 - \# \mbox{Parameters} = 5 -1-1 =3.\]
The test statistic is
\[ \chi^2_{obs} = \frac{(1486 -1502.6)^2}{1502.6} +\frac{(694-668.7)^2}{668.7} + \ldots + \frac{(11-9.2)^2}{9.2} =2.712.\]

Test \(H_0\): Probability model is a good fit, with \(\alpha =0.05\). We reject \(H_0\) if \(\chi^2_{obs} \geq \chi^2_{3,0.95} = `r round(qchisq(0.95,3),3)`\). Therefore we do not reject \(H_0\) at the \(5\%\) significance level.

NB. The probability model
\[ p_r = \frac{\mu^r \exp(-\mu)}{r! [1- \exp(-\mu)]}, \hspace{1cm} r=1,2,\ldots, \]

is known as the zero-truncated Poisson distribution.


Question 2.

In order to test the lifetime of small batteries used to power clocks, 40 batteries were chosen at random and tested. Their times (in months) in failure were
\[ \begin{array}{rrrrrrrrrr} 18& 11& 25& 36& 40& 72& 33& 51& 1& 12 \\ 46& 28& 87& 75& 24& 11& 23& 13& 45& 2 \\ 40& 79& 14& 59& 1& 7& 39& 54& 16& 3 \\ 8& 2& 52& 20& 9& 6& 7& 26& 31& 38 \end{array} \]
The manufacturer claims that the lifetimes, \(X\), have an exponential distribution with mean 30 months. If we assume this, calculate \(a\), \(b\), \(c\) and \(d\) such that
\[\begin{eqnarray*} \frac{1}{5} &=& P(0 < X <a) = P(a<X<b) = P(b < X <c) \\ &=& P(c < X< d) = P(d < X < \infty). \end{eqnarray*}\]

Construct a table of expected and observed frequencies for the above five intervals and hence test the manufacturer’s claim by using a goodness-of-fit test at the \(5\%\) level.

Solution to Question 2.
Need to solve \(F(x) = 1- \exp(-\lambda x) = p\), where \(\lambda = 1/30\) for \(p\) taking the values \(\frac{1}{5}, \; \frac{2}{5}, \; \frac{3}{5}, \; \frac{4}{5}\). Hence
\[ x = - \frac{1}{\lambda} \log (1-p) = - 30 \log (1-p), \]
so \(a=6.7\), \(b=15.3\), \(c=27.5\) and \(d=48.3\).
Observed frequencies are:
\[ \begin{array}{c|c|c|c|c} 0-6.7 & 6.7-15.3 & 15.3-27.5 & 27.5-48.3 & 48.3+ \\ \hline 6 & 9 & 7 & 10 & 8 \end{array}. \]
The expected number in each call is \(40 \times \frac{1}{5} =8\). The test statistic is:
\[ \chi^2_{obs} = \frac{(6 -8)^2}{8} +\frac{(9-8)^2}{8} + \ldots + \frac{(8-8)^2}{8} =1.25.\]

The degrees of freedom is \(\nu = 5-1=4\).
Now \(\chi^2_{4,0.95} = `r round(qchisq(0.95,3),4)`\), so the test is not significant at a \(5\%\) significance level and we do not reject \(H_0\). No reason to suppose that the manufacturer’s claim is incorrect.


Question 3.

In a clinical trial to test the effect of a new drug for influenza, 164 people with the condition were split into two equal groups, one of which was given the drug, the other a placebo. The table below indicates the response of the treatments.
\[ \begin{array}{l|ccc} & \mbox{Helped} & \mbox{Harmed} & \mbox{No effect} \\ \hline \mbox{Drug} & 50 & 10 & 22 \\ \mbox{Placebo} & 42 & 12 & 28 \end{array} \]

Test the hypothesis that the drug is no different from the placebo.

Solution to Question 3.
The marginal totals are:
\[ \begin{array}{l|ccc|c} & \mbox{Helped} & \mbox{Harmed} & \mbox{No effect} & \mbox{Total} \\ \hline \mbox{Drug} & 50 & 10 & 22 & 82 \\ \mbox{Placebo} & 42 & 12 & 28 & 82 \\ \hline \mbox{Total} & 92 & 22 & 50 & 164 \end{array}. \]
Therefore the expected frequencies are:
\[ \begin{array}{l|ccc} & \mbox{Helped} & \mbox{Harmed} & \mbox{No effect} \\ \hline \mbox{Drug} & 46 & 11 & 25 \\ \mbox{Placebo} & 46 & 11 & 25 \end{array} \]
Hence \(\frac{(O_{ij} -E_{ij})^2}{E_{ij}}\) for each cell is
\[ \begin{array}{l|ccc} & \mbox{Helped} & \mbox{Harmed} & \mbox{No effect} \\ \hline \mbox{Drug} & \frac{16}{46} & \frac{1}{11} & \frac{9}{25} \\ \mbox{Placebo} & \frac{16}{46} & \frac{1}{1} & \frac{9}{25} \end{array} \]
The test statistic is
\[ \chi^2_{obs} = 2 \left(\frac{16}{46} + \frac{1}{11} + \frac{9}{25} \right) =1.5975 \]

The degrees of freedom are \((r-1)(c-1)=(2-1)(3-1)=2\) and the critical value is \(\chi^2_{2,0.95} = `r round(qchisq(0.95,2),3)`\) at the \(5\%\) significance level. Therefore accept \(H_0\), that the drug is no different from placebo, i.e. there is no evidence that the response from the drug is different from the placebo.