3.4 Confidence intervals

As well as reporting the sample mean, reporting confidence intervals is very useful, because it gives us an idea how confident we are in our estimate.

For example, suppose we surveyed a sample of \(n = 50\) university students who answered the question, In hours, what was your phone screen time yesterday?. Further suppose that from this sample, we obtained an estimate of \(\bar{x} = 4 \text{ hours}\). Remember that \(\bar{x}\) is really only an indication of what \(\mu\) really is - that is, it only gives us an idea of what the true average screen time is for all university students, not just the 50 in our sample. Confidence intervals give us a bit more idea how confident we are in how well \(\bar{x}\) has estimated \(\mu\). Consider the following two statements:

  1. We are 95% confident that the true average phone screen time of university students is between 3.9 and 4.1 hours.
  2. We are 95% confident that the true average phone screen time of university students is between 1 and 7 hours.

Which statement do you prefer, and why?

While in both scenarios, our sample returned an estimated average screen time of 4 hours, we can clearly see that we were much more confident in this estimate in the first example above, because the confidence interval is narrower. In general, a narrow confidence interval means we are confident in our estimate. On the other hand, some confidence intervals can be so wide that they are barely informative at all!

The width of a confidence interval is determined by two things: our sample size, and the estimated variability in our sample. To calculate a confidence interval, we take \(\bar{x}\) and then add and subtract some margin of error. Consider the following definition:

95% Confidence interval calculation:

\[\bar{x} \pm t_{\text{df,}0.975}\times\text{SE},\]

where:

  • \(t_{\text{df,}0.975}\) is the value from the \(t_{\text{df}}\) distribution such that \(P(T \leq t_{\text{df,}0.975}) = 0.975\), i.e. the 0.975th quantile
  • \(\text{SE}\), the standard error, is equal to \(\frac{s}{\sqrt{n}}\).

As we know, most of the time, we have a significance level of \(\alpha = 0.05\), although different significance levels can be chosen. For example, we may have \(\alpha = 0.01\) or \(\alpha = 0.1\). Changing the significance level will not change the \(p\)-value – only the what we compare the p-value to. However, it will change the confidence interval. Our level of confidence depends on \(\alpha\) such that we have a \((1 – \alpha)\%\) confidence interval. If \(\alpha = 0.05\), we have a \((1 – 0.05)\% = 95\%\) confidence interval. Similarly, if \(\alpha = 0.01\), we have a \((1 – 0.01)\% = 99\%\) confidence interval and if \(\alpha = 0.1\), we have a \((1 – 0.1)\% = 90\%\) confidence interval. Consider the following, more general definition:

\((1 - \alpha)\%\) Confidence interval calculation:

\[\bar{x} \pm t_{\text{df,}1 - \alpha/2}\times\text{SE},\]

where:

  • \(t_{\text{df,}1 - \alpha/2}\) is the value from the \(t_{\text{df}}\) distribution such that \(P(T \leq t_{\text{df,}1 - \alpha/2}) = 1 - \alpha/2\), i.e. the \((1 - \alpha/2)\)th quantile
  • \(\text{SE}\), the standard error, is equal to \(\frac{s}{\sqrt{n}}\).

Returning to the cholesterol example

Recall the confidence interval of (5.01, 5.25) we reported earlier for the cholesterol example, and the associated statement: We are 95% confident that the true average cholesterol level for this particular population lies within the interval (5.01, 5.25).

How did we arrive at this result? We can use the above definition to help us find out. While we will be using statistical software packages to help us calculate confidence intervals, we would encourage you to work through the below example to better understand how confidence intervals work.

First of all, let's make a list of what we need to know to use the definition:

  • With \(\alpha = 0.05\), we want a \((1 - \alpha)\% = 95\%\) confidence interval
  • \(\bar{x}\) is equal to \(5.13\)
  • \(\text{df} = n - 1 = 72 - 1 = 71\)
  • We need to find \(t_{71,1 - \alpha/2} = t_{71,1 - 0.05/2} = t_{71,0.975}\). Using R, this value is equal to \(1.99\)
  • The sample standard deviation is \(s = 0.5\)
  • The sample size is \(n = 72\)
  • \(\text{SE} = \frac{s}{\sqrt{n}} = \frac{0.5}{\sqrt{72}} = 0.0589\)

Putting all of this together, we can calculate our 95% confidence interval as follows.

First, we have that \(t_{71,0.975} \times \text{SE} = 1.99 \times 0.0589= 0.1172\).

Next, we can add and subtract this number from \(\bar{x} = 5.13\) to calculate our confidence interval as follows:

  • \(5.13 - 0.1172 = 5.0128\)
  • \(5.13 + 0.1172 = 5.2472\),

for a confidence interval of (5.01, 5.25), rounded to two decimal places.