Chapter 20 Hypothesis Testing Discrete Data
20.1 Introduction
In Section 19 (Hypothesis Testing) we have studied hypothesis testing for normal random variables and through the central limit theorem sums (and means) of random variables. The normal distribution is a continuous distribution and there are many situations where we want to compare hypotheses with data or distributions which are discrete. These include:-
- Fitting a discrete probability distribution to data. Goodness-of-fit
- Testing independence between two discrete variables (contingency tables).
20.2 Goodness-of-fit motivating example
We start with a motivating example.
Film stars.
A film studio wants to decide which actor or actress to hire for the main role in a series of movies.
They have a shortlist of 5 and decide to ask the public who their favourite actor or actress is.
1,000 people are randomly selected and asked who their favourite actor or actress is from the shortlist of 5.
Results:
Preferred Actor | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
Frequency | 225 | 189 | 201 | 214 | 171 |
An investor in the film claims “There is no difference in who the public prefer we should hire the cheapest!”
Does the data support the investor’s claim?
We work through testing the investor’s claim via a series of steps.
Step 1
Interpret what the investor’s claim represents statistically.
“No difference in who the public prefers” means that if we choose an individual at random from the population they are equally likely to choose each of the five actors/actresses. That is, probability 1/5 of each actor/actress being chosen.
Step 2
What would we expect to observe in the data if the investor’s claim is true?
Preferred Actor | 1 | 2 | 3 | 4 | 5 |
---|---|---|---|---|---|
Frequency | 225 | 189 | 201 | 214 | 171 |
Expected | 200 | 200 | 200 | 200 | 200 |
Step 3
Is what we observe (Frequency) in the data consistent with what we expect (Expected) to see if the investor’s claim is a good model?
In hypothesis testing language, should we reject or not the null hypothesis:
\(H_0\): All actors equally popular.
In favour of the alternative hypothesis:
\(H_1\): There is a difference in popularity between at least two actors.
To compare competing hypotheses we require a test statistic and a sampling distribution for the test statistic under the assumption that the null hypothesis is true.
Test Statistic
For each outcome (actor), let \(O_i\) and \(E_i\) denote the number of observed and expected votes for actor \(i\).
The test statistic \(\chi_{obs}^2\) isSampling distribution
We reject \(H_0\) at a \(\alpha \%\) significance level ifSince \(\chi^2_{4,0.95} = 9.488\), we do not reject the null hypothesis at a \(5 \%\) significance level. That is, the investor’s claim of all actors being equally popular is reasonable given the observed data.
20.3 Goodness-of-fit
We describe the general procedure for testing the goodness-of-fit of a probability distribution to data using the \(\chi\)-squared distribution.
Suppose that we have \(N\) independent observations, \(y_1, y_2, \ldots, y_N\) from an unknown probability distribution, \(Y\). Suppose that there are \(n\) categories covering the possible outcomes and for \(i=1,2,\ldots, n\), let \(\mathcal{C}_i\) denote category \(i\). For example, we could have \(\mathcal{C}_i = \{ y = i\}\), the observations equal to \(i\), or \(\mathcal{C}_i = \{ a_i < y \leq b_i\}\), the observations equal in the range \((a_i, b_i]\).
For \(i=1,2,\ldots, n\), letthe number of data points observed in category \(i\).
We propose a probability distribution \(X\) for the unknown probability distribution, \(Y\). This gives us our null hypothesis:
\(H_0\): \(Y =X\)
with the alternative hypothesis
\(H_1\): \(Y \neq X\).
Under the null hypothesis, we calculate for each category \(i\), the expected number of observations we would expect to belong to category \(i\). That is, for \(i=1,2,\ldots,n\),and the number of degrees of freedom, \(\nu = n -1\).
We choose an \(\alpha \%\) significance level and reject the null hypothesis at an \(\alpha \%\) significance level ifImportant points
- The test statistic, under the null hypothesis, does not exactly follow a \(\chi^2\) distribution. As with the central limit theorem, the test statistic is approximately \(\chi^2\) distributed with the approximation becoming better as the amount of data in each category increases.
- For discrete data it will often be natural to choose \(\mathcal{C}_i = \{y = i\}\), whereas for continuous data we have considerable flexibility in choosing the number of categories and the category intervals. The considerations on choice of categories for goodness-of-fit testing are not dissimilar to the considerations on choice of bins for histograms.
- The expected frequencies in each category should not be too small with a rule of thumb that \(E_i \geq 5\). If some of the expected frequencies are less than 5 then we pool categories such that the expected frequency of the two (or more) categories combined is greater than or equal to 5.
- We will often want to fit a probability distribution \(X\) from a given family of probability distributions (e.g. Poisson, Gamma) without necessarily a priori choosing the parameters of the distribution. For example, we might choose to fit a Poisson distribution with mean \(\lambda\) to a data set and use the sample mean, \(\bar{y}\), as the choice of \(\lambda\). The goodness-of-fit procedure is as above except that we reduce the number of degrees of freedom by 1 for each parameter we estimate from the data,
\[ \nu = \# \mbox{Categories} -1 - \# \mbox{Estimated Parameters}. \]
Alleles.
Each person is one of the following genotypes \(A/A\), \(A/S\) or \(S/S\).
The proportion of people with each genotype is
where \(p\) is the proportion of alleles that are of type \(A\).
Is this a reasonable model for the data?
Watch Video 30 for the worked solutions to Example 2 (Alleles)
Video 30: Alleles
Alternatively worked solutions are provided:
Solution to Example 2: Alleles
We start with finding a suitable choice for \(p\).
We can estimate by \(p\) by \(\hat{p}\) the proportion of alleles of type \(A\) in the population:This is the MLE for \(p\).
Therefore the probabilities for each genotype are:At \(0.05\%\) significance level: \(\chi^2_{1,0.95} = 3.8415\).
Since, \(\chi^2_{obs} < \chi^2_{1,0.95}\), there is no evidence to reject the null hypothesis.
The \(p\)-value is 0.1238 (=\(\mathrm{P} (W > \chi^2_{obs})\)), where \(W\) is a \(\chi\)-square distribution with \(\nu =1\).
20.4 Testing Independence
Suppose that we two categorical variables, \(A\) and \(B\), where \(A\) can take \(m_A\) possible values and \(B\) can take \(m_B\) possible values.
Suppose that we have \(N\) observations with each observation belonging to one of the \(m_A\) categories of variable \(A\) and one of the \(m_B\) categories of variable \(B\). For \(i=1,2,\ldots, m_A\) and \(j=1,2,\ldots,m_B\), let \(O_{ij}\) denote the number of observations which belong to category \(i\) of variable \(A\) and category \(j\) of variable \(B\).
For example, variable \(A\) could be hair colour with categories:
1 - Brown
2 - Black
3 - Blonde
and variable \(B\) could be eye colour with categories:
1 - Brown
2 - Blue
3 - Green
Then \(N\) will be the total number of observations and \(O_{32}\) will be the number of observations (people) with Blonde hair and Blue eyes.
We often want to test the null hypothesis that the variables \(A\) and \(B\) are independent. For example, in the above scenario, the hypothesis that hair colour and eye colour are independent.
What does independence look like?
Let \(p_{i \cdot}\) denote the probability that an individual in the population will belong to category \(i\) of variable \(A\) and let \(p_{\cdot j}\) denote the probability that an individual in the population will belong to category \(j\) of variable \(B\). Then if variables \(A\) and \(B\) are independent, the probability of individual belonging both to category \(i\) of variable \(A\) and category \(j\) of variable \(B\) isdenote the total number of observations with variable \(B\) in category \(j\).
We can estimate \(p_{i \cdot}\) byfor the probability of an individual belonging both to category \(i\) of variable \(A\) and category \(j\) of variable \(B\) under the null hypothesis of independence between variables \(A\) and \(B\).
Therefore under the null hypothesis of independence the expected number of observations belonging to category \(i\) of variable \(A\) and category \(j\) of variable \(B\) isA school take part in a study which involves recording the eye colour and hair colour of each child.
The first step is to compute the row and column totals which give the total number of individuals with each hair colour and each eye colour, respectively.
Then using \(E_{ij} = N_{i \cdot} N_{\cdot j}/N\), we can compute the expected number of individuals in each category under the assumption of independence.
For example, the expected number of people with brown hair and brown eyes isdegrees of freedom.
Given that for a \(0.1%\) significance level \((\alpha=0.001)\), the critical value for the \(\chi^2\) distribution is \(\chi^2_{4, 0.999} =18.467\), there is very strong evidence to reject the null hypothesis. That is, there is very strong evidence that hair colour and eye colour are dependent.
Task: Lab 11
Attempt the R Markdown file for Lab 11:
Lab 11: Goodness-of-fit
Student Exercises
Attempt the exercises below.
Question 1.
The following data give the frequency distribution of the size of casual groups of people on a spring afternoon in a park.where \(\mu\) is estimated to be 0.89 for this data set.
Does this give a good fit to the data?
Solution to Question 1.
There are now 5 groups and the degrees of freedom for the test is
Test \(H_0\): Probability model is a good fit, with \(\alpha =0.05\). We reject \(H_0\) if \(\chi^2_{obs} \geq \chi^2_{3,0.95} = `r round(qchisq(0.95,3),3)`\). Therefore we do not reject \(H_0\) at the \(5\%\) significance level.
NB. The probability modelis known as the zero-truncated Poisson distribution.
Question 2.
In order to test the lifetime of small batteries used to power clocks, 40 batteries were chosen at random and tested. Their times (in months) in failure wereConstruct a table of expected and observed frequencies for the above five intervals and hence test the manufacturer’s claim by using a goodness-of-fit test at the \(5\%\) level.
Solution to Question 2.
Observed frequencies are:
The degrees of freedom is \(\nu = 5-1=4\).
Now \(\chi^2_{4,0.95} = `r round(qchisq(0.95,3),4)`\), so the test is not significant at a \(5\%\) significance level and we do not reject \(H_0\). No reason to suppose that the manufacturer’s claim is incorrect.
Question 3.
In a clinical trial to test the effect of a new drug for influenza, 164 people with the condition were split into two equal groups, one of which was given the drug, the other a placebo. The table below indicates the response of the treatments.Test the hypothesis that the drug is no different from the placebo.
Solution to Question 3.
The degrees of freedom are \((r-1)(c-1)=(2-1)(3-1)=2\) and the critical value is \(\chi^2_{2,0.95} = `r round(qchisq(0.95,2),3)`\) at the \(5\%\) significance level. Therefore accept \(H_0\), that the drug is no different from placebo, i.e. there is no evidence that the response from the drug is different from the placebo.