30 Tests for one proportion

So far, you have learnt to ask a RQ, design a study, classify and summarise the data, and construct confidence intervals. In this chapter, you will learn to:

  • identify situations where conducting a test for a proportion is appropriate.
  • conduct hypothesis tests for one sample proportion, using a \(z\)-test.
  • determine whether the conditions for using these methods apply in a given situation.

30.1 Introduction: rolling dice

In a toy store one day (for my children, of course), I saw 'loaded dice for sale. The packaging claimed one loaded & one normal. I bought two sets! However, there was no indication as to which die was the loaded die. So how could I determine which of the dice was 'loaded'? I guess had to roll the dice.

Suppose I selected one die to roll. If that die happened to be the fair die, I'd expect that each face would appear approximately (not exactly) one-sixth of the time (using classical probability; Sect. 18.4.1). So, I could roll one die, and see how often (for example) a actually appeared. (I could have chosen any number on the die, of course.) Then, using the decision-making process (Sect. 19.3), I could decide if that die seemed to be the fair die.

For one specific die, I could ask the decision-making RQ:

For this die, is the population proportion of rolls that show a equal to \(1/6\)?

Answering a decision-making RQ, such as this, requires a hypothesis test.

30.2 Statistical hypotheses and notation

First, define \(p\) as the population proportion of rolls that show a ; that is, \(p = 1/6\) if the die is a fair die (and we are not sure, of course). Then, define \(\hat{p}\) as the proportion of rolls in the sample that show a .

Even if the die was fair, and the value of \(p\) really was \(1/6\), the value of \(\hat{p}\) would not necessarily be exactly \(1/6\), due to sampling variation. Sometimes the value of \(\hat{p}\) would be a bit smaller than \(1/6\), and sometimes a bit larger, even if the value \(p\) really was \(1/6\).

However, if I assume the value of \(p\) is \(1/6\), the possible values of the sample proportion from all possible rolls of the fair die could be described; that is, the sampling distribution could be described. The sampling distribution would show what values of \(\hat{p}\) could reasonably be expected from a die with \(p = 1/6\).

Suppose we find that the value of \(\hat{p}\) is not exactly \(1/6\). One of two explanations could explain why:

  • The population proportion really is \(p = 1/6\).
    However, the value of \(\hat{p}\) is not exactly \(1/6\) due to sampling variation.
  • The population proportion really is not \(p = 1/6\).
    That is, the value of \(\hat{p}\) is not exactly \(1/6\) because the die is not fair.

These two possible explanations are called statistical hypotheses. The hypotheses above can be written as:

  • \(p = 1/6\), called the null hypothesis (denoted \(H_0\)); and
  • \(p \ne 1/6\), the called alternative hypothesis (denoted \(H_1\), or sometimes \(H_a\)).

The hypotheses propose values for the unknown population proportion (the parameter \(p\)).

The decision-making process begins by assuming the null hypothesis is true. Thus, the onus is on the data to refute the null hypothesis, the initial assumption.

That is, the null hypothesis is retained unless compelling evidence emerges to change our mind.

Here, the RQ here is open to the value of \(p\) being smaller or larger than \(1/6\); that is, two possibilities are considered. Hence, we write \(p\ne 1/6\), which is called a two-tailed alternative hypothesis. An alternative hypothesis like \(p > 1/6\) or \(p < 1/6\) is a one-tailed hypothesis.

The form of the alternative hypothesis (one- or two-tailed) depends on what the research question asks, not the data.

30.3 Sampling distribution of \(\hat{p}\)

As part of the decision-making process, hypothesis testing always begins by assuming the null hypothesis is true. Here, that means initially assuming \(p = 1/6\). In Chap. 23, the sampling distribution of a sample proportion was given when the value of \(p\) is known (Sect. 23.1). If I decide to use \(n = 100\) die rolls, for example, the sampling distribution can be described as:

  • an approximate normal distribution,
  • with mean whose value is \(1/6\),
  • with a standard deviation of \(\displaystyle \text{s.e.}(\hat{p}) = \sqrt{\frac{ (1/6) \times \left(1 - (1/6)\right)}{100}} = 0.037267\) from Eq. (23.1).

Provided certain conditions are met (Sect. 30.9), this describes how the values of \(\hat{p}\) would vary if \(p\) really was \(1/6\); see Fig. 30.1.

The notation \(\text{s.e.}(\hat{p})\) denotes the standard error of the sample proportion. Its value is the standard deviation of the proportions computed from all possible samples of a given size \(n\).

The mean of this distribution (the sampling mean) is the mean of the values of \(\hat{p}\) from all possible samples; the value of that mean is \(p\). Similarly, the standard deviation of this distribution is the standard error, denoted \(\text{s.e.}(\hat{p})\), and is the standard deviation of all possible values of the statistic \(\hat{p}\) (Fig. 30.1).

When computing the standard error for a proportion, take care!

  • The standard error for a confidence interval uses the sample proportion \(\hat{p}\) (see Eq. (23.4)), since we only have sample information when forming a CI.
  • The standard error for a hypothesis test uses the population proportion \(p\) from the null hypothesis (see Eq. (23.1)), since hypothesis testing assumes the null hypothesis is true, and hence the value of \(p\) is known.

In both cases, use a proportion in the formula, not a percentage (i.e., \(0.16666...\) rather than \(16.666...\)%). Don't forget to take the square root!

Figure 30.1 shows how the sample proportion varies when \(n = 100\) across all possible samples, simply due to sampling variation, when \(p = 1/6 = 0.1666...\). Values of \(\hat{p}\) between about \(0.13\) and \(0.20\) would seem to occur reasonably frequently when \(p = 1/6\). Values of \(\hat{p}\) larger than \(0.25\) look unlikely when \(n = 100\); values less than \(0.10\) also appear unlikely, but not impossible. A value above \(0.30\) almost never occurs in any possible sample.

The sampling distribution, showing the distribution of the sample proportion of ones when the population proportion is $1/6$, in $50$ die rolls.

FIGURE 30.1: The sampling distribution, showing the distribution of the sample proportion of ones when the population proportion is \(1/6\), in \(50\) die rolls.

In my \(100\) rolls of one die, \(41\) showed a , so that \(\hat{p} = 41/100 = 0.41\). From Fig. 30.1, this is practically impossible if the die was fair: it basically never occurs when we look at all possible samples. What I observed was almost impossible; but I really did observe it. A reasonable conclusion is that the assumption I was making---that the die is fair---is not tenable, nor supported by the evidence (i.e., the data).

30.4 Computing the value of the test statistic: \(z\)-tests

One way to measure how far the sample proportion \(\hat{p} = 0.41\) is from the population proportion \(p = 1/6\) in \(100\) rolls is to use a \(z\)-score, since the sampling distribution (Fig. 30.1) has an approximate normal distribution. Since the mean of the distribution is \(p\) and standard deviation of the distribution is \(\text{s.e.}(\hat{p})\), the \(z\)-score is \[\begin{align*} z &= \frac{\text{sample statistic} - \text{mean of the distribution}}{\text{standard deviation of the distribution}}\\ &= \frac{\hat{p} - p }{\text{s.e.}(\hat{p})} = \frac{0.41 - 0.1666...}{0.037267} = 6.53. \end{align*}\] In this context, the \(z\)-score is called a test statistic. The observed sample proportion is more than six standard deviations from the mean of the distribution; this is highly unusual according to the \(68\)--\(95\)--\(99.7\) rule (and Fig. 30.1).

30.5 Determining \(P\)-values

The value of the \(z\)-score shows that the observed value of \(\hat{p}\) is very unusual, but how unusual? Quantifying how unusual is assessed using a \(P\)-value, which is used widely in scientific research.

\(P\)-values refer to the area more extreme than the calculated \(z\)-score in the normal distribution; that is, in the tails of the distribution. This is a way of measuring how unusual the calculated \(z\)-score is. For two-tailed alternative hypotheses, the \(P\)-value is the combined area in the lower and upper tails. For one-tailed alternative hypotheses, the \(P\)-value is the area in one tail only. Clearly, since the \(P\)-value is a probability, its value is always between \(0\) and \(1\).

\(P\)-values can be approximated using the \(68\)--\(95\)--\(99.7\) rule and a diagram (Sect. 21.5; Sect. 30.5.1), or more precisely using the \(z\)-tables in App. B.1 (Sect. 21.7; Sect. 30.5.2). \(P\)-values are also reported by software for most statistical tests.

30.5.1 Approximating \(P\)-values using the \(68\)--\(95\)--\(99.7\) rule

The \(68\)--\(95\)--\(99.7\) rule can be used to determine approximate \(P\)-values. To demonstrate, suppose the computed \(z\)-score was \(z = 1\). Then, the two-tailed \(P\)-value is the shaded area in Fig. 30.2 (left panel): about \(32\)%, based on the \(68\)--\(95\)--\(99.7\) rule. The two-tailed \(P\)-value would be the same if \(z = -1\). The one-tailed \(P\)-value would be the area in one-tail: about \(16\)%, based on the \(68\)--\(95\)--\(99.7\) rule.

As another example, suppose the calculated \(z\)-score was \(z = 2\). Then, the two-tailed \(P\)-value is the shaded area shown in Fig. 30.2 (right panel): about \(5\)%, based on the \(68\)--\(95\)--\(99.7\) rule. The two-tailed \(P\)-value would be the same if \(z = -2\). The one-tailed \(P\)-value would be the area in one-tail: about \(2.5\)%, based on the \(68\)--\(95\)--\(99.7\) rule.

The two-tailed $P$-value is the combined area in the two tails of the distribution. Left panel: if $z = 1$ (or $z = -1$), the two-tailed $P$-value is approximately $0.16$. Right panel: if $z = 2$ (or $z = -2$), the two-tailed $P$-value is approximately $0.05$. (The one-tailed $P$-values are half the two-tailed $P$-values; i.e., in one tail only.)

FIGURE 30.2: The two-tailed \(P\)-value is the combined area in the two tails of the distribution. Left panel: if \(z = 1\) (or \(z = -1\)), the two-tailed \(P\)-value is approximately \(0.16\). Right panel: if \(z = 2\) (or \(z = -2\)), the two-tailed \(P\)-value is approximately \(0.05\). (The one-tailed \(P\)-values are half the two-tailed \(P\)-values; i.e., in one tail only.)

Of course, calculated \(z\)-scores are unlikely to be exactly \(z = 1\) or \(z = -2\). However, suppose the \(z\)-score is a little larger than \(z = 1\); say \(z = 1.2\). Then, the tail area will be a little smaller than the tail area when \(z = 1\) (Fig. 30.3, left panel). The two-tailed \(P\)-value is a little smaller than \(0.32\).

Similarly, suppose the \(t\)-score is a bit smaller than \(z = 2\); say \(z = 1.9\). Then, the tail area will be a little larger than the tail area when \(z = 2\) (Fig. 30.3, right panel). The two-tailed \(P\)-value is a little larger than \(0.05\).

The two-tailed $P$-value is the combined area in the two tails of the distribution. Left panel: when $z = 1.2$ (or $z = -1.2$). Right panel: when $z = 1.8$ (or $z = -1.8$).

FIGURE 30.3: The two-tailed \(P\)-value is the combined area in the two tails of the distribution. Left panel: when \(z = 1.2\) (or \(z = -1.2\)). Right panel: when \(z = 1.8\) (or \(z = -1.8\)).

30.5.2 More precise \(P\)-values using tables

Using the tables of areas under normal distributions (Appendix B.1.), more precise \(P\)-values can be found using the ideas from Sect. 21.6. For instance (see Fig. 30.3):

  • For \(z = 1.2\): the area to the left of \(z = -1.2\) is \(0.1151\), and the area to the right of \(z = 1.2\) is \(0.1151\), so the two-tailed \(P\)-value is \(0.1151 + 0.1151 = 0.2302\). This is a little smaller than \(0.32\), as estimated above.
  • For \(z = 1.9\): the area to the left of \(z = -1.9\) is \(0.0287\), and the area to the right of \(z = 1.9\) is \(0.0287\), so the two-tailed \(P\)-value is \(0.0287 + 0.0287 = 0.0574\). This is a little larger than \(0.05\), as estimated above.

In this die-rolling example, where \(z = 6.53\), the tail area is very small (using Appendix B.1), and zero to four decimal places (Fig. 30.1). \(P\)-values are never exactly zero, so we write \(P < 0.001\) (that is, the \(P\)-value is less than \(0.001\)).

30.6 Making decisions with \(P\)-values

\(P\)-values tells us the probability of observing the sample statistic (or one even more extreme), assuming the null hypothesis is true. In the die-rolling example, the \(P\)-value is the probability of observing the value of \(\hat{p} = 0.41\) (or more extreme), just through sampling variation (chance) if \(p = 1/6\). Since the \(P\)-value is a probability, it is a value between \(0\) and \(1\). Then (see the animation below).

  • 'Big' \(P\)-values mean the sample statistic (i.e., \(\hat{p}\)) could reasonably have occurred through sampling variation in one of the many possible samples, if the assumption made about the parameter (stated in \(H_0\)) was true: the data do not contradict the assumption in \(H_0\). There is no compelling evidence to support the alternative hypothesis.
  • 'Small' \(P\)-values mean the sample statistic (i.e., \(\hat{p}\)) is unlikely to have occurred through sampling variation in one of the many possible samples, if the assumption made about the parameter (stated in \(H_0\)) was true: the data do contradict the assumption in \(H_0\). There is compelling evidence to support the alternative hypothesis.

What is meant by 'small' and 'big' in this context? In other words, what represents compelling evidence to support the alternative hypothesis? A \(P\)-value smaller than \(5\)% (or \(0.05\)) is usually considered 'small', and compelling evidence to support the alternative hypothesis. In contrast, a \(P\)-value larger than \(5\)% (or \(0.05\)) is usually considered 'big', and not compelling evidence to support the alternative hypothesis. The value of \(0.05\) is arbitrary, and in some applications the distinction is made when \(P = 0.01\) or \(P = 0.10\) instead. The decision-making process is shown in Fig. 30.4.

A way to make decisions for the loaded-dice example.

FIGURE 30.4: A way to make decisions for the loaded-dice example.

In this die-rolling example, where the \(P\)-value is very small, the data contradict the null hypothesis (that \(p = 1/6\)), and there is compelling evidence to support the alternative hypothesis that \(p \ne 1/6\). This suggests that the die is very likely not fair.

Be careful interpreting the results! We cannot be sure that the die is unfair. A small \(P\)-value is not proof that the die is loaded. The die may be fair but, due to sampling variation, the sample we observed may simply have produced an unusually high proportion of rolls by chance.

The result is interpreted as 'there is evidence that the die is unfair'. Remember: the onus is on the data to refute the null hypothesis, the initial assumption.

Example 30.1 (Interpreting $P$-values) In the die example, suppose we found that the two-tailed \(P\)-value was \(0.26\). This is relatively 'large' (i.e., much larger than \(0.05\)). This means that the observed value of \(\hat{p}\) could easily be explained by chance, and is not compelling evidence to support the alternative hypothesis (that the die is unfair). We would say that there is no evidence that \(p\) is not \(1/6\).

30.7 Writing conclusions

In general, communicating the results of any hypothesis test requires:

  • an answer to the RQ, worded in terms of how much evidence exists to support the alternative hypothesis.
  • a summary of the evidence used to reach that conclusion (such as the \(z\)-score and \(P\)-value, including if the \(P\)-value is one- or two-tailed).
  • sample summary information, including a CI (see Chap. 23), summarising the data used to make the decision.

So for the die-rolling example, write:

The sample provides very strong evidence (\(z = 6.53\); two-tailed \(P < 0.001\)) that the proportion of sixes is not \(1/6\) (\(\hat{p} = 0.41\); approx. \(95\)% CI: \(0.312\) to \(0.508\); \(n = 100\) rolls) in the population.

This statement includes the three necessary components:

  • an answer to the RQ: 'The sample provides very strong evidence... that the population proportion is not \(1/6\)'. The wording states how much evidence exists in the sample to support the alternative hypothesis.
  • the evidence used to reach the conclusion: '\(z = 6.53\); two-tailed \(P < 0.001\))'.
  • sample summary information (including a CI).

Since the null hypothesis is initially assumed to be true, the onus is on the evidence to refute the null hypothesis. That is, we retain the null hypothesis unless there is compelling evidence to stop doing so. Hence, conclusions are worded in terms of how strongly the evidence (i.e., sample data) supports the alternative hypothesis.

The alternative hypothesis may or may not be true, but we report how strongly the evidence (data) supports the alternative hypothesis. Conclusions are not worded in terms of how much evidence support the null hypothesis.

In \(100\) rolls of the other die, I found a on \(15\) rolls, so that \(\hat{p} = 0.15\). Following the procedures above (check!) and using the same hypotheses, \(z = -0.45\) and (using tables) the two-tailed \(P\)-value is \(2\times 0.3264 = 0.6529\). This means that the sample result was not unusual if \(p = 1/6\), and there is no compelling evidence to support the alternative hypothesis. There is no evidence to suggest the second die is loaded.

This all suggests that the first die was the loaded die. Now I need to decide how to remember which die is the loaded one

A large \(P\)-value does not necessarily mean that the die is fair! It only means that the proportions of rolls that produce a is not unusual... but perhaps the die is loaded in some other way (i.e., to produce more-than-expected rolls of a ).

A large \(P\)-value does not necessarily mean that the die is fair! The die may indeed be loaded to produce a larger-than-expected numbers of rolls, but (due to sampling variation) the sample we observed simply did not provide evidence to make that conclusion.

The result is interpreted in terms of how much evidence exists to support the alternative hypothesis. The onus is on the data (i.e., evidence) to refute the assumption made in the null hypothesis.

30.8 Process overview

Let's recap the decision-making process, in this context about rolling a :

  1. Assumption: Write the null hypothesis and alternative hypothesis about the parameter (based on the RQ), where \(p\) is the population proportion of rolls that are a :
    • \(H_0\): \(p = 1/6\), and
    • \(H_1\): \(p \ne 1/6\) (this is a two-tailed alternative hypothesis).
  2. Expectation: The sampling distribution describes what values to reasonably expect from the sample statistic across all possible samples, if the null hypothesis is true. In this situation, the sampling distribution has a normal distribution.
  3. Observation: Compute the \(z\)-score (\(z = 6.53\)), a measure of the discrepancy between the assumed population value, and the observed sample value.
  4. Decision: Determine if the data are consistent with the assumption, by computing the \(P\)-value.

Here, the \(P\)-value is (much) less than \(0.001\), so very strong evidence exists that \(p\) is not \(1/6\).

30.9 Statistical validity conditions

The confidence intervals formed in this chapter assume the sampling distribution is approximately a normal distribution (and so, for example, the \(68\)--\(95\)--\(99.7\) rule can be applied). This is only true if certain conditions are met. For a hypothesis test for one proportion, these conditions are similar to those for the CI for one proportion (Sect. 23.6).

The statistical validity conditions for a test for a single proportion is that the expected number of individuals in the group of interest (i.e, \(n\times p\)) and in the group not of interest (i.e., \(n\times (1 - p)\)) both exceed five; that is:

  • \(n\times p > 5\), and \(n\times (1 - p) > 5\).

The value of \(5\) here is a rough figure; some books give other values (such as \(10\)). This condition ensures that the sampling distribution of the sample proportions has an approximate normal distribution (so that, for example, the \(68\)--\(95\)--\(99.7\) rule can be used).

The units of analysis are also assumed to be independent (e.g., from a simple random sample).

If the statistical validity conditions are not met, other similar options include using a binomial test (Conover 2003).

Example 30.2 (Statistical validity) The hypothesis test regarding the dice is statistically valid. Firstly, \(n\times p = 100 \times (1/6) = 16.666\dots\) (i.e., I would expect about 16.7 rolls to show a ), and \(n\times (1 - p) = 83.333\dots\) (i.e., I would expect about 83.3 rolls to not show a ). Both comfortably exceed five.

30.10 Example: dominance of birds

Barve and Dhondt (2017) compared two types of birds (male green-backed tits; male cinereous tits) to see which was more behaviourally dominant over winter. If the species were equally-dominant, then about \(50\)% of the interactions would be won by each species. If we define \(p\) as the proportion of interactions won by green-backed tits, then we would expect \(p = 0.50\). However, in the \(45\) interactions observed between the two species, green-backed tits won \(37\) of these interactions (i.e., \(\hat{p} = 37/45 = 0.82222\)).

Of course, every sample of \(45\) interactions would produce a different sample proportion, so the difference between this sample proportion and \(p = 0.5\) could be due to sampling variation. To test if the population proportion of interaction wins could be equally shared, the hypotheses are: \[ \text{$H_0$: } p = 0.5\quad\text{and}\quad\text{$H_1$: } p \ne 0.5 \text{ (two-tailed)}. \] The test is statistically valid, since \(n\times p = 45\times 0.5 = 22.5\) and \(n\times (1 - p) = 22.5\); both exceed five (i.e., expected half of the \(50\) interactions to be won by each species). The standard error is \[ \text{s.e.}(\hat{p}) = \sqrt{\frac{p \times (1 - p)}{n}} = \sqrt{\frac{0.50 \times (1 - 0.50)}{45}} = 0.0745356.... \] Note the value of \(p\), not \(\hat{p}\), is used in the calculation. Then, the value of the test statistic is: \[ z = \frac{\hat{p} - p}{\text{s.e.}(\hat{p})} = \frac{0.82222 - 0.50}{0.0745356} = 4.322. \] This is a very large \(z\)-score, so the \(P\)-value will be very small, using the \(68\)--\(95\)--\(99.7\) rule, or using tables. This is compelling evidence to support the alternative hypothesis.

Computing the \(95\)% CI for the proportion requires using the standard error computed with \(\hat{p}\) (not \(p\)): \[ \text{s.e.}(\hat{p}) = \sqrt{\frac{\hat{p} \times (1 - \hat{p})}{n}} = \sqrt{\frac{0.82222 \times (1 - 0.82222)}{45}} = 0.056999... \] An approximate \(95\)% CI is from \(0.708\) to \(0.936\). We write:

Very strong evidence exists in the sample (\(P < 0.001\); \(z = 4.325\)) that the interactions were not won equally by each species (\(\hat{p} = 0.8222\) won by green-backed tits; \(n = 45\); approximate \(95\)% CI: \(0.708\) to \(0.936\)) in the population.

30.11 Chapter summary

To test a hypothesis about a population proportion \(p\):

  • Write the null hypothesis (\(H_0\)) and the alternative hypothesis (\(H_1\)).
  • Initially assume the value of \(p\) in the null hypothesis to be true.
  • Then, describe the sampling distribution, which describes what to expect from the sample statistic across all possible samples, based on this assumption: under certain statistical validity conditions, the sample mean varies with:
    • an approximate normal distribution,
    • with sampling mean whose value if the value of \(p\),
    • with a standard deviation of \(\displaystyle \text{s.e.}(\hat{p}) = \sqrt{\frac{p \times (1 - p)}{n}}\), where \(p\) is the hypothesised value given in the null hypothesis, and \(n\) is the sample size.
  • Compute the value of the test statistic: \[ z = \frac{ \hat{p} - p}{\text{s.e.}(\hat{p})}. \]
  • Compute an approximate \(P\)-value using the \(68\)--\(95\)--\(99.7\) rule, or using tables.
  • Make a decision, and write a conclusion.
  • Check the statistical validity conditions.

30.12 Quick review questions

A study of diseases in native Americans (Kizer et al. 2006) found \(381\) obese or overweight patients in \(449\) patients. In the general population of the USA, the percentage obese or overweight is \(65\)%. The researchers wanted to determine if the percentage of obesity/overweight native Americans was greater than that of the general population.

  1. True or false: We initially assume the population proportion of overweight/obese native Americans is \(0.65\).
  2. True or false: The sample size is \(n = 381\).
  3. What is the value of the sample proportion \(\hat{p}\)? (Use four decimal places.)
  4. True or false: The null hypothesis is \(H_0\): \(p = 0.65\).
  5. True or false: The alternative hypothesis is one-tailed.
  6. True or false: In a one-sample test of proportion, the \(z\)-score is always large.
  7. What is the value of \(z\)-score for this example? (Use two decimal places.)
  8. True of false: We have compelling evidence to support the alternative hypothesis in this example.
  9. True or false? We always accept the null hypothesis.

30.13 Exercises

Answers to odd-numbered exercises are available in App. E.

Exercise 30.1 Explain why the standard error is computed using \(p\) for hypothesis testing, but using \(\hat{p}\) in confidence intervals.

Exercise 30.2 Explain why we compute \(\text{s.e.}(\hat{p})\) and not \(\text{s.e.}(p)\).

Exercise 30.3 What is wrong with the following statement, after testing \(H_0\): \(p = 0.25\):

There is very strong evidence that the sample proportion is greater than \(0.25\).

Exercise 30.4 Consider this statement from Davis et al. (2024), that appears under their Table 2:

One proportion \(z\)-test with \(H_0 = 0.076\), the proportion of UDT in our sample

What is wrong with this statement?

Exercise 30.5 The study of herbal medicines is complicated, as blinding subjects is difficult: placebos are often easily identifiable by eye, by taste, or by smell.

Loyeung et al. (2018) studied if subjects could identify potential placebos at a better rate than just guessing. The \(81\) subjects were each presented with a choice of five different supplements, four of which were placebos. Subjects were asked to select which one was the legitimate herbal supplement based on the taste. Of these, \(50\) correctly selected the true herbal supplement.

  1. If the subjects were selecting the true herbal supplement randomly, what proportion of subjects would be expected to select the correct supplement as the true herbal medicine?
  2. Write the hypotheses for addressing the aims of the study.
  3. Is this a one- or two-tailed test? Explain.
  4. Sketch the sampling distribution of the sample proportion, assuming the null hypothesis is correct.
  5. Is there evidence that people can identify the true supplement by taste?
  6. Are the statistical validity conditions satisfied?

Exercise 30.6 S.-S. Kim et al. (2004) studied the measles-rubella vaccination-rates in Korea. They compared the proportion of children with measles antibodies to the World Health Organization (WHO) target proportion (for children aged \(5\) to \(9\) years old: \(10\)%).

The aim of the study was to test if the proportion of Korean children with the measles antibody in the population was \(10\)% or lower (i.e., better). In the study, \(55\) children out of \(972\) had the antibody present

  1. Compute the sample proportion \(\hat{p}\) of children with measles antibodies.
  2. Write the hypotheses for the test. Is the test one- or two-tailed?
  3. Compute the standard error for the test.
  4. Compute the \(z\)-score and determine the \(P\)-value.
  5. Write a conclusion.
  6. Are the statistical validity conditions satisfied?

Exercise 30.7 Streeting et al. (2022) studied western saw-shelled turtles. When eggs were incubated at \(27\)oC, they observed that \(29\) males and \(44\) females hatched. Are the proportions of male and female turtles that hatch at this temperature equal?

Exercise 30.8 [Dataset: PremierL] In the 2019/2020 English Premier League (EPL), the home team won \(91\) games, and the away team won \(67\) games. (Another \(50\) games were draws.)

Use the \(158\) games with a result to determine if there is evidence that the home team wins more often than \(50\)% (i.e., that there is a home-side advantage).

Exercise 30.9 Maeda (2013) introduced pedal machines on the first floor of Joyner Library for use by students at East Carolina University, to increase activity in library users. At ECU, \(60.2\)% of all students were females (i.e., in the population). Students were observed using the machine on \(589\) occasions, of which \(295\) times were by females

Is there evidence that the proportion of females users of the machines was lower than the overall female proportion at the university? What would you conclude?

Exercise 30.10 Koenen (1995) found that \(88\) of the \(357\) visitors to Las Vegas casinos in 1995 were smokers. At the time, \(25.5\)% of the general US population were smokers (based on data from the US National Center for Health Statistics). Are casino-goers just as likely to be a smokers as the general US population?

Exercise 30.11 Nochera and Ragone (2019) developed a gluten-free pasta made from breadfruit. In the study sample, \(57\) of the \(71\) participants stated that they liked the pasta. Do the researchers have sufficient evidence to claim that the 'majority of people like breadfruit pasta'?

Exercise 30.12 Carpal Tunnel Syndrome (CTS) is a painful condition in the wrists. Boltuch et al. (2020) were interested in whether 'a relationship exists between the palmaris tendon [and] carpal tunnel syndrome (CTS)' (p. 493). The palmaris longus (PL) tendon is visually absent in about \(15\)% of the population. The researchers found PL was visually absent in \(33\) of \(516\) CTS wrists in their sample.

Is there evidence to suggest that rate of PL absence is different in CTS cases, compared to the general population?

Exercise 30.13 Siegfried et al. (2014) studied resistance of some commercial corn varieties to the European corn borer. Borers were collected from corn in Iowa and Nebraska.

Researchers aimed to estimate the frequency of resistance to the toxin in the corn. By mating borers collected from the field with various resistant laboratory individuals, they could determine what proportion of resistant individuals to expect in the second generation offspring. In one study of \(n = 172\) second-generation individuals, \(24\) were found to be resistant. The expectation was that \(1\)-in-\(16\) would be resistant if the field borers were resistant.

Perform a hypothesis test to determine if the data suggest that the borers were resistant (that is, if the population proportion is \(1/16\)) as expected.

Exercise 30.14 Davidovic et al. (2019) studied streetlight preferences of drivers. Drivers were asked to conduct a series of manoeuvres under \(3000\)K LED light and then under \(4000\)K LED lights. They were then asked to decide which streetlight they preferred.

Out of the \(52\) subjects, \(29\) preferred the \(3000\)K LED lights. Is there evidence that the choice between the two streetlights is random, or is there evidence of a preference for one over the other?

Exercise 30.15 The euro was introduced as a currency on 01 January 1999. According to a report by the New Scientist, students in Poland spun a Belgian one-euro coin \(250\) times, and found \(140\) heads (as reported by Gelman and Nolan (2002)). This resulted in an 'accusation of bias' in the New Scientist article. However, every set of \(250\) spins can produces a different proportion of heads, so perhaps the results is just due to randomness.

Does this sample of \(250\) spins suggest that the one-euro Belgian coin is biased?

Exercise 30.16 As noted in Sect. 18.4.2, the Australian Bureau of Statistics (ABS) stated that:

The sex ratio for all births registered in Australia generally fluctuates around \(105.5\) male births per \(100\) female births.

(This statistic does not use births registered as 'other' or 'not stated'.)

  1. The value of \(105.5\) is effectively a population odds ratio of male-to-female births. Show that this is equivalent to stating the population proportion of male births is \(0.51338\) (not including 'other' or 'not stated').
  2. In 2021, there were \(148\ 636\) male births and \(140\ 944\) female births. Compute the sample proportion of male births in 2021 (to five decimal places). (Another \(23\) births were registered as 'other' or 'not stated', but are not used.)
  3. Conduct a hypothesis test to determine if the 2021 data appear different to the long-term proportion.