Module 9 Inference: Comparing Parameters
In this module, we extend the concepts from Module 6 to answer questions like “is there a difference between these means?” We will also consider hypothesis tests for whether a sample represents the population or closely matches a particular distribution.
Module Learning Outcomes/Objectives
- Perform and interpret inference for
- the difference of two proportions.
- paired data and two sample means.
R Objectives
- Generate hypothesis tests for the difference of two proportions.
- Generate hypothesis tests for the difference of two means.
- Interpret R output for tests of two proportions and two means.
This module’s outcomes correspond to course outcomes (6) apply statistical inference techniques of parameter estimation such as point estimation and confidence interval estimation and (7) apply techniques of testing various statistical hypotheses concerning population parameters.
9.1 Hypothesis Tests for Two Proportions
Sometimes, we might like to compare two proportions. We do this by looking at their difference: p1−p2. This is going to be fairly similar to the tests we used for a single proportion. Let n1 be the sample size for the first group and p1 the proportion for the first group. Similarly, let n2 be the sample size for the second group and p2 the proportion for the second group.
Conditions:
- Independence within and between groups (generally satisfied if the data are from random samples or a randomized experiment).
- We need n1p1>10 and n1(1−p1)>10 and n2p2>10 and n2(1−p2)>10
If these conditions are satisfied, the standard error is √p1(1−p1)n1+p2(1−p2)n2 and we can calculate confidence intervals and perform hypothesis tests on p1−p2.
9.1.1 Confidence Intervals for Two Proportions
A 100(1−α)% confidence interval for p1−p2 is
^p1−^p2±zα/2×√^p1(1−^p1)n1+^p2(1−^p2)n2
9.1.2 Critical Values, Test Statistics, and P-Values
Often, we are interested in checking whether p1=p2, which results in a null hypothesis of H0:p1−p2=0 (where the null value is zero). In this case, we use a pooled proportion to estimate p in the standard error.
This pooled proportion is calculated as ˆppooled=total number of successestotal number of cases=^p1n1+^p2n2n1+n2 which makes the standard error in this case Standard Error=√ˆppooled(1−ˆppooled)n1+ˆppooled(1−ˆppooled)n2
The critical value is zα/2. The test statistic is z=^p1−^p2√ˆppooled(1−ˆppooled)n1+ˆppooled(1−ˆppooled)n2and the p-value is 2P(Z>|z|) where z is the test statistic.
Steps:
- State the null and alternative hypotheses.
- Determine the significance level α. Check assumptions, n1p1>10 and n1(1−p1)>10 and n2p2>10 and n2(1−p2)>10.
- Compute the value of the test statistic.
- Determine the critical value or p-value.
- For the critical value approach: If the test statistic is in the rejection region, reject the null hypothesis. For the p-value approach: If p-value<α, reject the null hypothesis. Otherwise, do not reject.
- Interpret results.
9.2 Hypothesis Tests for Two Means
What if we wanted to compare two means? We begin by discussing paired samples. This will feel very familiar, since it’s essentially the same as hypothesis testing for a single mean. Then we will move on to independent samples, which will require a couple of adjustments.
9.2.1 Paired Samples
Sometimes there is a special correspondence between two sets of observations. We say that two sets of observations are paired if each observation has a natural connection with exactly one observation in the other data set. Consider the following data from 30 students given a pre- and post-test on a course concept:
Student | Pre-Test | Post-Test |
---|---|---|
1 | 52 | 70 |
2 | 71 | 98 |
3 | 13 | 65 |
… | … | … |
30 | 48 | 81 |
The natural connection between “pre-test” and “post-test” is the student who took each test! Often, paired data will involve similar measures taken on the same item or individual. We pair these data because we want to compare two means, but we also want to account for the pairing.
Why? Consider: If a student got a 13% on the pre-test, I would love to see them get a 60% on the post-test - that’s a huge improvement! But if a student got an 82% on the pre-test, I would not like to see them get a 60% on the post-test. Pairing the data lets us account for this connection.
So what do we do with paired data? Fortunately, this part is easy! We start by taking the difference between the two sets of observations. In the pre- and post-test example, I will take the pre-test score and subtract the post-test score:
Student | Pre-Test | Post-Test | Difference |
---|---|---|---|
1 | 52 | 70 | 18 |
2 | 71 | 98 | 27 |
3 | 13 | 65 | 52 |
… | … | … | … |
30 | 48 | 81 | 33 |
Then, we do a test of a single mean on the differences where
- H0:μd=0
- HA:μd≠0
Note that the subscript “d” denotes “difference”. We will use the exact same test(s) as in the previous sections:
Large Sample Setting: μd is target parameter, nd≥30, z=ˉxdsd/√nd and the p-value is 2P(Z>|z|) where z is the test statistic.
Small Sample Setting: μd is target parameter, nd<30, t=ˉxdsd/√nd and the p-value is 2P(tdf>|t|) where t is the test statistic.
Here, nd is the number of pairs.
Steps:
- State the null and alternative hypotheses.
- Determine the significance level α. Check assumptions (decide which setting to use).
- Compute the value of the test statistic.
- Determine the critical values or p-value.
- For the critical value approach: If the test statistic is in the rejection region, reject the null hypothesis. For the p-value approach: If p-value<α, reject the null hypothesis. Otherwise, do not reject.
- Interpret results.
9.2.2 Independent Samples
In independent samples, the sample from one population does not impact the sample from the other population. In short, we take two separate samples and compare them.
- H0:μ1=μ2→H0:μ1−μ2=0
- HA:μ1≠μ2→HA:μ1−μ2≠0
If we use ˉx to estimate μ, intuitively we might use ˉx1−ˉx2 to estimate μ1−μ2. To do this, we need to know something about the sampling distribution of ˉx1−ˉx2.
Consider: if X1 is Normal(μ1, σ1) and X2 is Normal(μ2,σ2) with σ1 and σ2 are known, then for independent samples of size n1 and n2,
- ˉX1−ˉX2 is Normal(μˉX1−ˉX2, σˉX1−ˉX2).
- μˉX1−ˉX2=μ1−μ2
- σˉX1−ˉX2=σ1−σ2
so then Z=(ˉX1−ˉX2)−(μ1−μ2)√σ1/n1−σ2/n2 has a standard normal distribution. But, as we mentioned earlier, we rarely work in that setting where the population standard deviation is known. Instead, we will use s1 and s2 to estimate σ1 and σ2. For independent samples of size n1 and n2, t=(ˉX1−ˉX2)−(μ1−μ2)√s1/n1−s2/n2 has a t-distribution with degrees of freedom Δ=[(s21/n1)+(s22/n2)]2(s21/n1)2n1−1+(s22/n2)2n2−1 rounded down to the nearest whole number. (Note that Δ is the uppercase Greek letter, “delta”.) If n1=n2, this simplifies to Δ=(n−1)((s21+s22)2s41+s42)
Tip: Generally, people do not calculate Δ by hand. Instead, we use a computer to do these kinds of tests.
Assumptions:
- Simple random samples.
- Independent samples.
- Normal populations or large (n≥30) samples.
Steps for Critical Value Approach:
- H0:μ1−μ2=0 and HA:μ1−μ2≠0
- Check assumptions; select the significance level α.
- Compute the test statistic t=ˉx1−ˉx2√s1/n1−s2/n2 Note that we assume under the null hypothesis that μ1−μ2=0, which is why we replace this quantity with 0 in the test statistic.
- The critical value is ±tdf,α/2 with df=Δ.
- If the test statistic falls in the rejection region, reject the null hypothesis.
- Interpret in the context of the problem.
Steps for P-Value Approach:
- H0:μ1−μ2=0 and HA:μ1−μ2≠0
- Check assumptions; select the significance level α.
- Compute the test statistic t=ˉx1−ˉx2√s1/n1−s2/n2 Note that we assume under the null hypothesis that μ1−μ2=0, which is why we replace this quantity with 0 in the test statistic.
- The p-value is 2P(tdf>|t|) with df=Δ.
- If p-value<α, reject the null hypothesis.
- Interpret in the context of the problem.
Notice that the only difference between the critical value and p-value approaches are steps 4 and 5.
Example: Researchers wanted to detemine whether a dymanic or static approach would impact the time needed to complete neurosurgeries. The experiment resulted in the following data from simple random samples of patients:
Dynamic Static ˉx1=394.6 ˉx2=468.3 s1=84.7 s2=38.2 n1=14 n2=6 Times are measured in minutes. Assume X1 and X2 are reasonably normal.
- H0:μ1=μ2 and HA:μ1≠μ2
- Let α=0.05 (this will be our default when a significance level is not given)
- We are told these are simple random samples.
- There’s no reason that time for a neurosurgery with the dynamic system would impact time for the static system (or vice versa), so it’s reasonable to assume these samples are independent.
- We are told to assume that X1 and X2 are reasonably normal.
- The test statistic is t=394.6−468.384.72/14+38.22/6=−2.681
- Then df=Δ=(84.72/14)+(38.22/6)2(84.72/14)214−1+(38.22/6)26−1=17 when rounded down. The critical value is t17,0.025=2.110 and the p-value is 2P(t17>|−2.681|)=2(0.0079)=0.0158
- For the critical value approach,
Since the test statistic is in the rejection region, we reject the null hypothesis. For the p-value approach, since p-value=0.158<α=0.05, reject the null hypothesis.
- At the 0.05 level of significance, the data provide sufficient evidence to conclude that the mean time for the dynamic system is less than the mean time for the static system.
We can also construct a (1−α)100% confidence interval for the difference of the two population means: (ˉx1−ˉx2)±tdf,α/2√s21n1+s22n2 which we interpret as we interpret other confidence intervals, including in our interpretation that we are now considering the difference of two means.
R Lab: Comparing Parameters
Hypothesis Tests for Two Proportions
To compare two proportions, we will use the command prop.test
. This is similar to binom.test
, but the latter command does not allow us to compare two proportions. We will need the following arguments:
x
: a listing of the numbers of successes in each of the two groups. This will take the formx = c(x1, x2)
.n
: a listing of the numbers of trials for each group. This will take the formn = c(n1, n2)
.conf.level
: the confidence level (1−α).
Note that order matters in c(x1, x2)
and c(n1, n2)
. Make sure to keep track of which variable you have set as 1 an which as 2. This test also assumes a null hypothesis of p1=p2.
This test has a few behind-the-scenes tweaks relative to what we do by hand. This means that the results might be slightly different than the results you get when running these tests by hand. That’s ok!
The sleep
dataset in R contains data on two groups (10 in each) of patients given soporific drugs (drugs designed to induce sleep). We want to examine whether the proportion of patients who experienced an increase in hours of sleep differs between the two groups.
I have this set up with two variables, d1
and d2
, which represent drug 1 and drug 2. Each variable is 1 if the patient experienced an increase in hours of sleep and 0 if they did not. Let’s print these out and find out how many successes were in each group.
## [1] 1 0 0 0 0 1 1 1 0 1
## [1] 1 1 1 1 0 1 1 1 1 1
We can find the total number of successes for each by summing the values in each variable. Let’s do that in R using the sum
command:
## [1] 5
## [1] 9
So the numbers of successes are x1=5 and x2=9 for group sizes n1=n2=10. For the prop.test
command, this will look like x = c(5, 9)
and n = c(10,10)
. We will use an α=0.1 level of significance. Then
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(5, 9) out of c(10, 10)
## X-squared = 2.1429, df = 1, p-value = 0.1432
## alternative hypothesis: two.sided
## 90 percent confidence interval:
## -0.803296023 0.003296023
## sample estimates:
## prop 1 prop 2
## 0.5 0.9
The output of this test is (top to bottom)
- The data provided in the input.
- A test statistic and degrees of freedom (these are part of the behind-the-scenes tweaks and you can ignore them!) along with a p-value.
- When a hypothesis test says “two sided” that means the null hypothesis represents the “not equal” condition that we work with.
- The requested confidence interval.
- The sample proportions.
Although the sample proportions appear to be different, the sample sizes are very small! Therefore it is unsurprising that the data provide insufficient evidence to conclude that the drugs differ in their ability to increase hours slept (p=0.143 and the confidence interval includes 0).
Hypothesis Tests for Two Means
The math has only gotten more cumbersome! Let’s use R to quickly run these types of tests without having to do any calculations by hand.
There is data built into R that shows the effect of Vitamin C on tooth growth in guinea pigs through (A) ascorbic acid or (B) orange juice. (Each guinea pig was randomly assigned to either ascorbic acid or orange juice.) We want to compare the ascorbic acid group to the orange juice group to see if one has more tooth growth than the other. This is currently in a data set called teeth
, which contains two variables: aa
, the tooth length for guinea pigs in the ascorbic acid group and oj
the tooth length for the orange juice group.
To run a two-sample test comparing means in R, we continue to use the command t.test
. The arguments we need in this case are:
x
: the first variable.y
: the other variable.mu
: the null value, usually μ1−μ2=0.paired
: set this equal toTRUE
for paired t tests; set it equal toFALSE
for independent samples.conf.level
: the desired confidence level (1−α).
In this case, we are interested in variables x = aa
and y = oj
. The null value is mu = 0
. Guinea pigs were randomly assigned to each treatment group, so these are independent samples and paired = FALSE
. Finally, we will go ahead and test this at a 0.05 level of significance, so conf.level = 0.95
. Putting that all together, the R command looks like
##
## Welch Two Sample t-test
##
## data: aa and oj
## t = -1.9153, df = 55.309, p-value = 0.06063
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -7.5710156 0.1710156
## sample estimates:
## mean of x mean of y
## 16.96333 20.66333
The R output shows (top to bottom)
- variables entered.
- the test statistic, degrees of freedom, and p-value.
- the alternative hypothesis.
- a confidence interval for the difference of the two means.
- sample means for each variable.
Based on the output, at the 0.05 level of significance, the data provide insufficient evidence to conclude that the mean tooth length for guinea pigs receiving ascorbic acid differs from the guinea pigs receiving orange juice (p=0.061 and the confidence interval includes 0).