Chapter 4 Statistical Inference (FQA)
4.1 Interval Estimates
4.1.1 Proportions
Suppose that in a survey of one hundred adult cell phone users, 30% switched carriers in the past two years. We can calculate a confidence interval for this proportion using the prop.test()
command:
##
## 1-sample proportions test with continuity correction
##
## data: 30 out of 100, null probability 0.5
## X-squared = 15.21, df = 1, p-value = 9.619e-05
## alternative hypothesis: true p is not equal to 0.5
## 95 percent confidence interval:
## 0.2145426 0.4010604
## sample estimates:
## p
## 0.3
In this output, the line 95 percent confidence interval:
tell us that our 95% confidence interval is (21.45%; 40.11%).
4.1.2 Means
To calculate a confidence interval for the average salary in data
, we can use the t.test()
function:
##
## One Sample t-test
##
## data: data$Salary
## t = 120.22, df = 919, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 153931.5 159040.5
## sample estimates:
## mean of x
## 156486
From this output we see that our 95% confidence interval is ($153,931.5; $159,040.5).
4.2 Hypothesis Testing
4.2.1 One Sample
4.2.1.1 Proportions
Suppose we have the following null and alternative hypotheses:
\(H_o\): In the past two years, the proportion of adult cell phone users who switched carriers equals 35%.
\(H_a\): In the past two years, the proportion of adult cell phone users who switched carriers does not equal 35%.
We then survey one hundred of adult cell phone users, and thirty of them report that they switched carriers in the past two years. We can run this hypothesis test in R using the prop.test()
function. In the code below, the argument p
specifies the value in the null hypothesis (35%).
##
## 1-sample proportions test with continuity correction
##
## data: 30 out of 100, null probability 0.35
## X-squared = 0.89011, df = 1, p-value = 0.3454
## alternative hypothesis: true p is not equal to 0.35
## 95 percent confidence interval:
## 0.2145426 0.4010604
## sample estimates:
## p
## 0.3
In this output the p-value (0.3454) is relatively large, so we fail to reject the null hypothesis and cannot conclude that the true proportion is different than 35%.
4.2.1.2 Means
Suppose we have the following null and alternative hypotheses:
\(H_o\): The true average rating of all employees at a company equals five.
\(H_a\): The true average rating of all employees at a company does not equal five.
The data set data
contains a sample of employees from the company, and the Rating
column contains each employee’s rating. We can run this hypothesis test in R using the t.test()
function. In the code below, the argument mu
specifies the value in the null hypothesis (5).
##
## One Sample t-test
##
## data: data$Rating
## t = 33.04, df = 999, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 5
## 95 percent confidence interval:
## 6.87463 7.11137
## sample estimates:
## mean of x
## 6.993
In this output the p-value (< 2.2e-16) is quite small, so we reject the null hypothesis and conclude that the true average rating is likely different than five.
4.2.2 Two Sample
4.2.2.1 Proportions
Suppose that Professor Yael and Professor Michael were both given a section of entering students for a statistics boot camp before fall classes started. After the boot camp ended, a survey was given to all the participants. Of the 75 who had Yael as an instructor, 45 said they were satisfied, whereas 48 of the 90 who had Michael were satisfied. Is there a significant difference in the percentage of students who were satisfied between the two instructors? To test this, our null and alternative hypotheses would be:
- \(H_o\): There is no difference in the proportion of satisfied students in Michael and Yael’s classes.
- \(H_a\): : There is a difference in the proportion of satisfied students in Michael and Yael’s classes.
We can use prop.test()
in R to calculate the appropriate p-value from this sample data:
##
## 2-sample test for equality of proportions with continuity correction
##
## data: c(45, 48) out of c(75, 90)
## X-squared = 0.49304, df = 1, p-value = 0.4826
## alternative hypothesis: two.sided
## 95 percent confidence interval:
## -0.09693574 0.23026908
## sample estimates:
## prop 1 prop 2
## 0.6000000 0.5333333
In this output the p-value (0.4826) is relatively large, so we fail to reject the null hypothesis and cannot conclude that there is a difference in the proportion of satisfied students in Michael and Yael’s classes.
4.2.2.2 Means
The data set gss
contains data from the General Social Survey, which tracks American attitudes on a wide variety of topics. Within gss
, the INCOME
column records the income of each respondent (a quantitative variable) and WRKGOVT
indicates whether each respondent works for the government or in the private sector (a categorical variable). Suppose we have the following null and alternative hypotheses:
\(H_o\): On average, government workers earn the same as those in the private sector.
\(H_a\): On average, government workers do not earn the same as those in the private sector.
We can run this hypothesis test in R using the t.test()
function:
##
## Welch Two Sample t-test
##
## data: gss$INCOME by gss$WRKGOVT
## t = 1.497, df = 343.21, p-value = 0.1353
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -1184.580 8732.605
## sample estimates:
## mean in group 1 mean in group 2
## 44621.83 40847.81
In this output the p-value (0.1353) is greater than 0.05, so we fail to reject the null hypothesis at a significance level of 0.05 (or 0.10). This means we cannot conclude that there is a difference in the income of government and private sector workers.