Chapter 5 Week 9

5.1 prop.test

Test of Equal or Given Proportions

  • When apply to a multi-sample data, the prop.test() command performs a test for proportions, and gives a confidence interval for the difference in proportions as part of the output.
prop.test(x, n, p = NULL,
          alternative = c("two.sided", "less", "greater"),
          conf.level = 0.95, correct = TRUE)
  • The input could be:

    • x only:

      • A two-dimensional table (or matrix) with 2 columns

      • if x is a table or matrix, n would be ignored.

      • The first column as the counts of successes (e.g. \(n(D)\)), and the second as counts of failures (e.g. \(n(\overline D)\)).

    • x and n:

      • x: a vector of counts of successes
      • n: a vector of count trials
    • x, n and p:

      • p: a vector of null probabilities of success.
    • Note: The length of n or p must be the same as the number of groups specified by x.

  • Test Assumptions: The function operates on the assumption that each of the length(x) samples is independent of the others, and that each sample consists of a pre-determined number n[i] of independent trials, for which the true probability of success is constant.

  • Hypothesis: If the argument p=NULL, and there are at least two groups, the null hypothesis states that the true probability of success is the same in every group.

    • When there are two groups, the alternative hypothesis asserts that the probability of success in the first group is greater than, less than, or simply not equal to that in the second group, depending on the value of the argument alternative.

    • When there are more than two groups, the alternative hypothesis is that there is at least one group whose probability of success is different from the others; thus alternative is two.sided.

    • If the argument p is not NULL, the null hypothesis states that the true probability of success in group i is p[i], for each value of i. The alternative hypothesis, when there are at least two groups, is that there is some group for which this relation does not hold; thus alternative is two.sided.

5.1.1 Example: Survive by multiple levels of ticket class

  • Using the data set, titanic, conduct Overall test of association of dying (survived) as a passenger’s ticket class (pclass) changes from 1st to 2nd to 3rd.
table <- with(titanic, table(pclass, survived))
table
##       survived
## pclass Died Survived
##      1  123      200
##      2  158      119
##      3  528      181
# We can use either column as the number of event
x <- table[,"Died"] # counts of D (Died), x <- table1[,1]
x
##   1   2   3 
## 123 158 528
n <- rowSums(table) # counts of total sample size in each level
n
##   1   2   3 
## 323 277 709
p <- rep(sum(x)/sum(n), 3) # H0: probabilities of D (Died) are the same among 3 exposure levels
p
## [1] 0.618029 0.618029 0.618029
  • \(H_0\): Risk of dying as a passenger is independent of the ticket class.

  • Your test using R could following any of the example format, and the results would be the same:

prop.test(table1)
prop.test(x, n)
prop.test(x, n, p)
prop.test(table)
## 
##  3-sample test for equality of proportions without continuity
##  correction
## 
## data:  table
## X-squared = 127.86, df = 2, p-value < 2.2e-16
## alternative hypothesis: two.sided
## sample estimates:
##    prop 1    prop 2    prop 3 
## 0.3808050 0.5703971 0.7447109
  • Comparing 127.86 to a \(\chi^2\) distribution with 3-1=2 degrees of freedom, we get a \(p\)-value very close to zero.

  • We therefore reject the null hypothesis that the risk of dying are equal across levels of ticket classes (i.e. that death and ticket classes are independent).

  • Note that, for 2 x 2 table, the standard chi-square test in chisq.test() is exactly equivalent to prop.test() but it works with data in matrix form.

# perform the chi-square test of association
chisq.test(table)
## 
##  Pearson's Chi-squared test
## 
## data:  table
## X-squared = 127.86, df = 2, p-value < 2.2e-16

5.2 prop.trend.test

Test for trend in proportions

prop.trend.test(x, n, score = seq_along(x))
  • Input: Note that input for prop.trend.test cannot be a matrix

    • x: Number of events

    • n: Number of trials

    • score: Group score

  • Hypotheses: With at least three groups, the null hypothesis states that the There is no trend among the proportions (independence).. The alternative states that the proportions have an increasing or decreasing trend.

5.2.1 Example: Test of trend: Survive by multiple levels of ticket class

  • Using the data set, titanic, conduct Test of trend of dying (survived) as a passenger’s ticket class (pclass) changes from 1st to 2nd to 3rd.

    • Null hypothesis: \(H_0: P(Died|pclass = 1) = P(Died|pclass = 2) = P(Died|pclass = 3)\)

    • Alternative hypothesis: \(H_A: P(Died|pclass = 1) < P(Died|pclass = 2) < P(Died|pclass = 3)\) or \(P(Died|pclass = 1) > P(Died|pclass = 2) > P(Died|pclass = 3)\)

prop.trend.test(x, n)
## 
##  Chi-squared Test for Trend in Proportions
## 
## data:  x out of n ,
##  using scores: 1 2 3
## X-squared = 127.81, df = 1, p-value < 2.2e-16
  • Comparing 127.81 to a \(\chi^2\) distribution with 1 degree of freedom, we get a \(p\)-value very close to zero.

  • We therefore reject the null hypothesis that the risk of deaths are equal across levels of ticket class.