Module 8 Inference for a Proportion
In this module, we will continue our discussion on statistical inference with a discussion on hypothesis testing. In hypothesis testing, we take a more active approach to our data by asking questions about population parameters and developing a framework to answer those questions. We will root this discussion in confidence intervals before learning about several other approaches to hypothesis testing.
Module Learning Outcomes/Objectives
- Perform and interpret inference for a population proportion.
R Objectives
- Generate hypothesis tests for a proportion.
- Interpret R output for tests of a proportion.
This module’s outcomes correspond to course outcomes (6) apply statistical inference techniques of parameter estimation such as point estimation and confidence interval estimation and (7) apply techniques of testing various statistical hypotheses concerning population parameters.
8.1 Confidence Intervals for a Proportion
Inference for a proportion is really similar to inference for a mean! It turns out we can apply the Central Limit Theorem to the sampling distribution for a proportion. But wait - isn’t our Central Limit Theorem only for means?
Think back to the binomial distribution (Section 4.3). A binomial experiment is made up of a series of Bernoulli trials, which result in 0s and 1s. If we add up these values, we get the number of successes x. If we take the mean of these successes, we get the proportion of successes. In short, ˉx=ˆp and we can work with the sampling distribution for a sample mean!
The mean of a Bernoulli random variable is μ=p and the standard deviation is σ=√p(1−p). So if we apply the Central Limit Theorem, ˆp is approximately normally distributed with mean μˆp=p and standard error σˆp=√p(1−p)√n=√p(1−p)n
Each of the confidence intervals for a mean uses the same logic: estimate ± critical value × standard error Confidence intervals for a proportion will do the same. We do not know the true value of p for the standard error, so we will plug in ˆp.
ˆp±zα/2√ˆp(1−ˆp)n
To use this formula, we need to check that nˆp>10 and n(1−ˆp)>10. (Note that nˆp is the number of successes and n(1−ˆp) is the number of failures, so this is another way to check this condition!)
Why? This relies on a normal approximation that does not work well if either of those quantities is less than or equal to 10. (This a topic which we have skipped, but the theory behind it is similar to the theory presented here for why we can use the Central Limit Theorem with proportions.)
Example: Suppose we take a random sample of 27 US households and find that 15 of them have dogs. Find a 95% confidence interval for the proportion of US households with dogs.
Solution: From the problem statement, α=0.05. Also, ˆp=15/27=0.56. The number of successes (households with dogs) in the sample is 15 and the number of failures is 12, both greater than 10, so our assumptions are satisfied.
The critical value is zα/2. Using the normal distribution applet with α=0.05, this yields a value of 1.96. Plugging everything in, ˆp±zα/2√ˆp(1−ˆp)n=0.56±1.96√0.56×0.4427=0.56±0.19 or a 95% confidence interval of (0.37, 0.75).
Based on our sample, we can be 95% confident that the proportion of US households with dogs is between 0.37 and 0.75.
The concepts and interpretation behind these confidence intervals are the same as those for confidence intervals for a mean. Refer back to Module 6 for details.
Section Exercises
A survey of Stat 1 students resulted in the following data:
Year | Count |
---|---|
Freshman | 16 |
Sophomore | 11 |
Junior | 3 |
Senior | 5 |
- We wish to find a 95% confidence interval for the proportion of freshmen.
- Based on the same data provided above, find ˆp, the sample proportion of freshmen.
- Confirm the condition that nˆp>10 and n(1−ˆp)>10.
- Determine the critical value.
- Find the 95% confidence interval for the proportion of freshmen.
- Interpret your interval in the context of the problem.
- Find a 98% confidence interval for the proportion of sophomores.
- Can you use the methods from this section to find a confidence interval for the proportion of juniors? Explain your thought process.
8.2 Hypothesis Tests for a Proportion
For a single proportion, the null and alternative hypotheses are
- H0:p=p0
- HA:p≠p0
We can perform a hypothesis test for p using the confidence interval, critical value, or p-value approach we covered previously. The concepts and interpretation are the same as those described in Module 7. You will also notice that the steps for each approach have not changed! The only modifications we need to make are to our setting, assumption, and a couple of formulas.
Setting and Assumptions: p is target parameter, np0>10, n(1−p0)>10.
8.2.1 Confidence Interval Approach
The 100(1−α)% confidence interval for p is ˆp±zα/2√p0(1−p0)n Notice that we use p0 in the standard error and not the sample proportion. This is different from how we dealt with the standard error when calculating confidence intervals outside of a hypothesis testing context. We do this because the standard error is calculated based on the distribution based on the null hypothesis, which says that p=p0.
Steps:
- State null and alternative hypotheses.
- Decide on significance level α. Check assumptions.
- Find the critical value.
- Compute confidence interval.
- If the null value is not in the confidence interval, reject the null hypothesis. Otherwise, do not reject.
- Interpret results in the context of the problem.
Example: A quick internet search suggests that 38.4% of US households have dogs. Based on the sample described previously, is it reasonable to assume that the internet search is correct? Test at the 0.05 level of significance using a confidence interval approach.
Solution: We know from the previous example that ˆp=0.56 and n=27.
- We want to see if the internet search is correct, so the null and alternative hypotheses are H0:p=0.384 HA:p≠0.384
- From the problem statement, α=0.05. Also, np0=27(0.384)=10.4 and n(1−p0)=27(0.616)=16.6, both greater than 10, so our assumptions are satisfied.
- The critical value is zα/2. Using the normal distribution applet with α=0.05, this yields a value of 1.96.
- Plugging everything in, ˆp±zα/2√p0(1−p0)n=0.56±1.96√0.384×(1−.384)27=0.56±0.18 or a 95% confidence interval of (0.38, 0.74).
- The null value is in the interval, so we fail to reject H0.
- At the 0.05 level of significance, the data provide insufficient evidence to conclude that the proportion of US Households with dogs differs from 0.384.
8.2.2 Critical Value Approach
The critical value is zα/2 and the test statistic is z=ˆp−p0√p0(1−p0)n Notice that we again plug in p0 for the standard error!
Steps:
- State the null and alternative hypotheses.
- Determine the significance level α. Check assumptions.
- Compute the value of the test statistic.
- Determine the critical values.
- If the test statistic is in the rejection region, reject the null hypothesis. Otherwise, do not reject.
- Interpret results.
Example: A quick internet search suggests that 38.4% of US households have dogs. Based on the sample described previously, is it reasonable to assume that the internet search is correct? Test at the 0.05 level of significance using a critical value approach.
Solution: We know from a previous example that ˆp=0.56 and n=27.
- We want to see if the internet search is correct, so the null and alternative hypotheses are H0:p=0.384 HA:p≠0.384
- From the problem statement, α=0.05. Also, np0=27(0.384)=10.4 and n(1−p0)=27(0.616)=16.6, both greater than 10, so our assumptions are satisfied.
- The test statisic is z=ˆp−p0√p0(1−p0)n=0.56−0.384√0.384(0.616)27=1.41
- The critical value is zα/2. Using the normal distribution applet with α=0.05, this yields a value of 1.96.
- The test statistics is not in the rejection region, so we fail to reject H0.
- At the 0.05 level of significance, the data provide insufficient evidence to conclude that the proportion of US Households with dogs differs from 0.384.
8.2.3 P-Value Approach
The p-value is 2P(Z>|z|) where z is the test statistic described above.
Steps:
- State the null and alternative hypotheses.
- Determine the significance level α. Check assumptions.
- Compute the value of the test statistic.
- Determine the p-value.
- If p-value<α, reject the null hypothesis. Otherwise, do not reject.
- Interpret results.
Example: A quick internet search suggests that 38.4% of US households have dogs. Based on the sample described previously, is it reasonable to assume that the internet search is correct? Test at the 0.05 level of significance using a p-value approach.
Solution: We know from a previous example that ˆp=0.56 and n=27.
- We want to see if the internet search is correct, so the null and alternative hypotheses are H0:p=0.384 HA:p≠0.384
- From the problem statement, α=0.05. Also, np0=27(0.384)=10.4 and n(1−p0)=27(0.616)=16.6, both greater than 10, so our assumptions are satisfied.
- The test statisic is z=ˆp−p0√p0(1−p0)n=0.56−0.384√0.384(0.616)27=1.41
- The p-value is 2P(Z>|z|)=2P(Z>1.41) Using the normal distribution applet, we find this probability to be 2(0.079)=0.159.
- The p-value =0.159>α=0.05, so we fail to reject H0.
- At the 0.05 level of significance, the data provide insufficient evidence to conclude that the proportion of US Households with dogs differs from 0.384.
Section Exercises
Another survey of Stat 1 students resulted in the following data:
Year | Count |
---|---|
Freshman | 26 |
Sophomore | 21 |
Junior | 13 |
Senior | 15 |
- Suppose we wish to test the hypothesis that 50% of all Stat 1 students are freshmen using the confidence interval approach. We will test at the 0.05 level of significance.
- State the null and alternative hypothesis.
- Find ˆp, the sample proportion of freshmen.
- Determine α and check assumptions.
- Determine the critical value.
- How will this confidence interval formula differ from the one you found in Section 8.1 Exercise 1?
- Compute the confidence interval.
- What is your conclusion?
- Interpret your results in the context of the problem.
- Suppose we want to know if 25% of Stat 1 students are sophomores. Use the confidence interval approach to complete this test at the 0.1 level of significance.
- Use the critical value approach at the 0.05 level of significance to test the hypothesis that 25% of all Stat 1 students are juniors.
- State the null and alternative hypothesis.
- Find ˆp, the sample proportion of juniors.
- Determine α and check assumptions.
- Compute the test statistic.
- Determine the critical values.
- Sketch a normal curve, label the critical values, and indicate the rejection regions.
- What is your conclusion?
- Interpret your results in the context of the problem.
- Use the critical value approach at the 0.02 level of significance to test the hypothesis that 20% of all Stat 1 students are seniors.
- Use the p-value approach at the 0.01 level of significance to test the hypothesis that 40% of all Stat 1 students are freshmen.
- State the null and alternative hypothesis.
- Find ˆp, the sample proportion of freshmen.
- Determine α and check assumptions.
- Compute the test statistic.
- Use an applet to determine the p-value.
- What is your conclusion?
- Interpret your results in the context of the problem.
- Use the p-value approach at the 0.05 level of significance to test the hypothesis that 10% of all Stat 1 students are seniors.
R: Hypothesis Tests for a Proportion
To generate confidence intervals and hypothesis tests for a proportion, we will use the command binom.test
. This will give us slightly different results than the z-test we used throughout this module, but it is actually going to be more exact! This approach also does not have any limitations on the values nˆp or np0. We use the z-test when working by hand because the exact binomial test is difficult to do on paper. The arguments we need are:
x
: the number of successes.n
: the number of trials.p
: the null value p0.conf.level
: the desired confidence level (1−α).
Let’s continue to use the example seen throughout this module. We have a random sample of 27 US households and 15 of them have dogs. We also have the claim that, in fact, 38.4% of US households have dogs. We will use a significance level of α=0.05.
Based on the prompt, there are x=15 successes; n=27 trials; and p0=0.384. So the R command will look like
##
## Exact binomial test
##
## data: 15 and 27
## number of successes = 15, number of trials = 27, p-value = 0.07591
## alternative hypothesis: true probability of success is not equal to 0.384
## 95 percent confidence interval:
## 0.3532642 0.7452012
## sample estimates:
## probability of success
## 0.5555556
The output shows (top to bottom):
- a summary of the data we entered, along with the p-value.
- the alternative hypothesis.
- a 95% confidence interval for p.
- the sample proportion ˆp.
Since this is slightly different from the test used when we discussed doing these calculations by hand, when we do hypothesis tests for a proportion using R, we will not use the critical value approach. Based on the confidence interval and p-value, at the 0.05 level of significance, the data provide insufficient evidence to conclude that the proportion of US Households with dogs differs from 0.384. (In general, we will come to the same conclusion whether we do these tests by hand or using R.)