Chapter 26 Inference on Two Proportions

Let us perform a hypothesis test to see if there is a difference between two proportions. For this example, a random survey of 705 elderly men, aged 70 & up, was taken and 14.6% were found to have Type 2 diabetes. At the same time, a random survey of 688 elderly women, aged 70 & up, was taken and 9.1% of the women were found to have Type 2 diabetes. (Data from Journal of Clinical Endocrinololgy & Metabolism, October 2016).

26.1 Enter Data as a Vector

For statistical inference on proportions in R, whether it be a single proportion or two proportions, we use the function,
prop.test(success_vector, total_count_vector, p = probability of success, …)

The following arguments may be added as needed:

alternative = “less” or “greater” - for one-sided test. If nothing is specified, then two-sided alternative is the default.
conf.level = confidence level desired. If no confidence level is specified, the argument defaults to 95% confidence level.

One way to test for significance and to compute the confidence interval is to enter the data as a vector into the prop.test( ) function. We will call the first vector, diabetes, which will contain the diabetes count for men and women. Note that the diabetes count = (total men (or women)) * (respective percentages with Type 2 diabetes). The second vector will be called total, which will contain the total count of men and women surveyed.

Two-Sided Hypothesis Test

The two-sided alternative hypothesis test using 95% confidence level as default is as follows.

diabetes <- c(705*0.146, 688*0.091)
total <- c(705, 688)
prop.test(diabetes, total)

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  diabetes out of total
## X-squared = 9.5405, df = 1, p-value = 0.00201
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.01978163 0.09021837
## sample estimates:
## prop 1 prop 2 
##  0.146  0.091

The hypothesis test shows that the difference is statistically significant with a P-value of 0.00201. At the 95% confidence level, we can say that the difference in proportion between diabetic men and diabetic women is between 0.02 and 0.09, higher for men than women.

We can also say that the difference between the percentage of men and women who have Type 2 diabetes is 5.5% with a margin of error of 3.5%.

Calculation:
$\begin{align} D & = \mbox{Estimate of the difference in the two population proportions} \\ & = \mbox{prop1} - \mbox{prop2} \\ & = 0.146 - 0.091 \\ & = 0.055 \\ & = 5.5 \mbox{%} \\ \end{align}$

$\begin{align} \mbox{Margin of Error} &= (0.055 - 0.01978163) \mbox{or} (0.09021837 - 0.055) \\ & = 0.03521837 \\ & = 3.5 \mbox{%} \\ \end{align}$

To do a hypothesis test at a 98% confidence level, add the argument conf.level.

prop.test(diabetes, total, conf.level = 0.98)

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  diabetes out of total
## X-squared = 9.5405, df = 1, p-value = 0.00201
## alternative hypothesis: two.sided
## 98 percent confidence interval:
##  0.01346655 0.09653345
## sample estimates:
## prop 1 prop 2 
##  0.146  0.091

If you do not want to assign names to vectors, you can also enter the sequence of values into the function, prop.test( ) as follows.

prop.test(c(705*0.146, 688*0.091), c(705, 688), conf.level = 0.80)

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  c(705 * 0.146, 688 * 0.091) out of c(705, 688)
## X-squared = 9.5405, df = 1, p-value = 0.00201
## alternative hypothesis: two.sided
## 80 percent confidence interval:
##  0.03147491 0.07852509
## sample estimates:
## prop 1 prop 2 
##  0.146  0.091

The results are exactly the same.

Calculating Confidence Interval Only

To calculate the confidence interval only, append $conf.int after the prop.test( ) function. If no confidence level is specified, R defaults to 95%.

prop.test(diabetes, total)$conf.int

## [1] 0.01978163 0.09021837
## attr(,"conf.level")
## [1] 0.95

To use a confidence level other than 95%, specify the confidence level.

# Calculates 90% Confidence Level
prop.test(diabetes, total, conf.level = 0.90)$conf.int

## [1] 0.02521295 0.08478705
## attr(,"conf.level")
## [1] 0.9

One-Sided Hypothesis Test

Suppose we want to see if the prevalence of diabetes is higher in men than in women. In this case, we do a one-sided hypothesis test. Be sure to use the argument, alternative = “less” or alternative = “greater”.

prop.test  (diabetes, total, alternative = "greater")

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  diabetes out of total
## X-squared = 9.5405, df = 1, p-value = 0.001005
## alternative hypothesis: greater
## 95 percent confidence interval:
##  0.02521295 1.00000000
## sample estimates:
## prop 1 prop 2 
##  0.146  0.091

Note that the P-value for the one-sided alternative test is half the P-value obtained for the two-sided test. The hypothesis test shows that statistically, there is a greater proportion of elderly men with Type 2 diabetes than elderly women.

26.2 Enter Data in Matrix Form

Another way to test for significance and to calculate the confidence interval is to enter the data in matrix form.

Couple of things to note regarding the function, matrix( ).

The default for listing entries of the matrix is by column.
To list the entries by row, add the argument, byrow = TRUE.
State the number of rows using the argument, nrow, and the number of columns using the argument, ncol.
The argument, dimnames, will give headings to the rows and columns.

We will call our new matrix, sugar.

sugar <- matrix(c(103, 602, 63, 625), 
                nrow = 2, ncol = 2, byrow = TRUE, 
                dimnames = list(c("Men", "Women"), 
                                c("Diabetes", "No Diabetes")))
sugar       # Call out matrix to see if it looks right

##       Diabetes No Diabetes
## Men        103         602
## Women       63         625

Two-Sided Hypothesis Test

To do significance test or confidence interval, we use the function, prop.test(matrix). The default is a two-sided alternative hypothesis test with a 95% confidence interval.

prop.test(sugar)

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  sugar
## X-squared = 9.351, df = 1, p-value = 0.002229
## alternative hypothesis: two.sided
## 95 percent confidence interval:
##  0.01926703 0.08979202
## sample estimates:
##     prop 1     prop 2 
## 0.14609929 0.09156977

Compare the results with the vector method. They should be the same.

If you want a different confidence level, specify it in the argument.

prop.test(sugar, conf.level = 0.90)

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  sugar
## X-squared = 9.351, df = 1, p-value = 0.002229
## alternative hypothesis: two.sided
## 90 percent confidence interval:
##  0.02470544 0.08435361
## sample estimates:
##     prop 1     prop 2 
## 0.14609929 0.09156977

We see that the confidence interval narrows as the confidence level goes down.

Calculating Confidence Interval Only

To calculate the confidence interval only, append $conf.int after the prop.test( ) function. If no confidence level is specified, R defaults to 95%.

prop.test(sugar)$conf.int

## [1] 0.01926703 0.08979202
## attr(,"conf.level")
## [1] 0.95

To calculate confidence level other than 95%, specify the confidence level.

# Calculates 90% Confidence Level
prop.test(sugar, conf.level = 0.99)$conf.int

## [1] 0.008637962 0.100421085
## attr(,"conf.level")
## [1] 0.99

We see that the confidence interval widens as the confidence level goes up.

One-sided Hypothesis Test

To do a one-sided alternative hypothesis test, add in the argument alternative = “less” or alterntive = “greater”. At the same time, let us change the confidence level to 99%.

prop.test(sugar, alternative = "greater", conf.level = 0.99)

## 
##  2-sample test for equality of proportions with continuity correction
## 
## data:  sugar
## X-squared = 9.351, df = 1, p-value = 0.001114
## alternative hypothesis: greater
## 99 percent confidence interval:
##  0.0129437 1.0000000
## sample estimates:
##     prop 1     prop 2 
## 0.14609929 0.09156977

Again, the results should match the results using the vector method.