This book is in Open Review. We want your feedback to make the book better for you and other students. You may annotate some text by selecting it with the cursor and then click the on the pop-up menu. You can also see the annotations of others: click the in the upper right hand corner of the page

3.4 Confidence Intervals for the Population Mean

As stressed before, we will never estimate the exact value of the population mean of Y using a random sample. However, we can compute confidence intervals for the population mean. In general, a confidence interval for an unknown parameter is a recipe that, in repeated samples, yields intervals that contain the true parameter with a prespecified probability, the confidence level. Confidence intervals are computed using the information available in the sample. Since this information is the result of a random process, confidence intervals are random variables themselves.

Key Concept 3.7 shows how to compute confidence intervals for the unknown population mean E(Y).

Key Concept 3.7

Confidence Intervals for the Population Mean

A 95% confidence interval for μY is a random variable that contains the true μY in 95% of all possible random samples. When n is large we can use the normal approximation. Then, 99%, 95%, 90% confidence intervals are

99% confidence interval for μY=[Y¯±2.58×SE(Y¯)],95% confidence interval for μY=[Y¯±1.96×SE(Y¯)],90% confidence interval for μY=[Y¯±1.64×SE(Y¯)].

These confidence intervals are sets of null hypotheses we cannot reject in a two-sided hypothesis test at the given level of confidence.

Now consider the following statements.

  1. In repeated sampling, the interval [Y¯±1.96×SE(Y¯)] covers the true value of μY with a probability of 95%.

  2. We have computed Y¯=5.1 and SE(Y¯)=2.5 so the interval [5.1±1.96×2.5]=[0.2,10] covers the true value of μY with a probability of 95%.

While 1. is right (this is in line with the definition above), 2. is wrong and none of your lecturers wants to read such a sentence in a term paper, written exam or similar, believe us. The difference is that, while 1. is the definition of a random variable, 2. is one possible outcome of this random variable so there is no meaning in making any probabilistic statement about it. Either the computed interval does cover μY or it does not!

In R, testing of hypotheses about the mean of a population on the basis of a random sample is very easy due to functions like t.test() from the stats package. It produces an object of type list. Luckily, one of the most simple ways to use t.test() is when you want to obtain a 95% confidence interval for some population mean. We start by generating some random data and calling t.test() in conjunction with ls() to obtain a breakdown of the output components.

# set seed
set.seed(1)

# generate some sample data
sampledata <- rnorm(100, 10, 10)

# check the type of the outcome produced by t.test
typeof(t.test(sampledata))
## [1] "list"
# display the list elements produced by t.test
ls(t.test(sampledata))
## [1] "alternative" "conf.int"    "data.name"   "estimate"    "method"     
## [6] "null.value"  "p.value"     "parameter"   "statistic"

Though we find that many items are reported, at the moment we are only interested in computing a 95% confidence set for the mean.

t.test(sampledata)$"conf.int"
## [1]  9.306651 12.871096
## attr(,"conf.level")
## [1] 0.95

This tells us that the 95% confidence interval is

[9.31,12.87].

In this example, the computed interval obviously does cover the true μY which we know to be 10.

Let us have a look at the whole standard output produced by t.test().

t.test(sampledata)
## 
##  One Sample t-test
## 
## data:  sampledata
## t = 12.346, df = 99, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
##   9.306651 12.871096
## sample estimates:
## mean of x 
##  11.08887

We see that t.test() does not only compute a 95% confidence interval but automatically conducts a two-sided significance test of the hypothesis H0:μY=0 at the level of 5% and reports relevant parameters thereof: the alternative hypothesis, the estimated mean, the resulting t-statistic, the degrees of freedom of the underlying t distribution (t.test() does use perform the normal approximation) and the corresponding p-value. This is very convenient!

In this example, we come to the conclusion that the population mean is significantly different from 0 (which is correct) at the level of 5%, since μY=0 is not an element of the 95% confidence interval

0[9.31,12.87]. We come to an equivalent result when using the p-value rejection rule since

p-value=2.210160.05.