26 More about CIs

So far, you have learnt to ask a RQ, design a study, classify and summarise the data, and have also been introduced to confidence intervals. In this chapter, you will learn more about forming confidence intervals. You will learn to

  • communicate confidence intervals.
  • interpret confidence intervals.

26.1 General comments

The previous chapters discussed forming confidence intervals (CI) for estimating a population proportion, and for estimating a mean. CIs in other contexts will also be studied. This chapter discusses some principles that apply to CIs in general.

CIs are formed for an unknown population parameter (such as the population proportion \(p\)), using the best estimate of the parameter: the sample statistic (such as the sample proportion \(\hat{p}\)). Most CIs have the form
\[ \text{Statistic} \pm (\text{multiplier} \times \text{standard error}), \] where \((\text{multiplier} \times \text{standard error})\) is called the margin of error. For an approximate \(95\)% CI, the multiplier is \(2\) (from the \(68\)--\(95\)--\(99.7\) rule), provided the statistical validity conditions are met. The statistical conditions should always be checked to see if the CI is (at least approximately) statistically valid.

Confidence intervals tell us about the unknown population parameter, based on what we learn from one the countless possible sample statistics.

26.2 About writing conclusions

When reporting a CI, include:

  1. the CI (including units of measurement, if relevant);
  2. the level of confidence for the CI (typically, a \(95\)% CI); and
  3. the value of the statistic (the parameter estimate) and the sample size.

If the CI is an approximate CI (e.g., based on using an approximate multiplier of \(2\)), this should also be clear.

Example 26.1 (Writing conclusions) In Sect. 25.5, the mean cadmium level of peanuts was estimated. The conclusion given was:

The sample mean cadmium concentration of peanuts is \(\bar{x} = 0.0768\) ppm (s.e.: \(0.00270\); \(n = 290\)), with an approximate \(95\)% CI from \(0.0714\) to \(0.0822\) pmm.

Each of the three elements above are given:

  1. the CI: \(0.0714\) to \(0.0822\) pmm;
  2. the level of confidence for the CI: \(95\)%; and
  3. sample summary information: \(\bar{x} = 0.0768\) ppm; s.e.: \(0.00270\); \(n = 290\).

In addition, the CI is flagged as a approximate \(95\)% CI.

26.3 About interpreting CIs

Interpreting CIs correctly takes care. The correct interpretation (Def. 24.3) of a \(95\)% CI is the following:

If samples of the same size were repeatedly taken many times, and the \(95\)% confidence interval computed for each sample, \(95\)% of these confidence intervals formed would contain the population parameter.

This is the idea shown in the animation in Sect. 24.4. In practice, this definition is unsatisfying, since we only ever have one sample. Furthermore, since the value of the parameter is unknown (after all, the reason for taking a sample was to estimate the value of the parameter), we don't know if our CI from our single sample includes the population parameter or not.

Two reasonable alternative interpretations for a \(95\)% CI are:

  • The \(95\)% CI gives a range of values of the unknown parameter that could plausibly (with \(95\)% confidence) have produced our observed value of the statistic.
  • There is a \(95\)% chance that our \(95\)% CI straddles the value of the population parameter.

These alternatives are reasonable and common interpretations.

Commonly, the CI is described as having a \(95\)% chance of containing the population parameter. This is not strictly correct (the CI either does or does not contains the value of the population parameter), but is a common and a convenient paraphrase for the correct interpretation above.

I use this analogy: Most people say the sun rises in the east. This is incorrect; the sun doesn't rise at all. People say the sun rises in the east as a convenient way to explain that the earth rotates on its axis, and we see the sun each morning in the east. Similarly, most people interpret a CI as an interval with a certain chance of containing the value of the population parameter, even though it is technically incorrect.

Example 26.2 (Interpreting CIs) In Example 26.1, the approximate \(95\)% CI was from \(0.0714\) to \(0.0822\). The correct interpretation is:

If many samples of \(290\) peanuts were taken, and the approximate \(95\)% CI computed for each one, about \(95\)% of those CIs would contain the population mean.

We don't know if our CI from a single sample includes the value of \(\mu\), however. We might say:

This \(95\)% CI (from \(0.0714\) to \(0.0822\)) is likely to straddle the actual value of \(\mu\).

or

The range of values of \(\mu\) that could plausibly (with \(95\)% confidence) have produced \(\bar{x} = 0.0768\) is between \(0.0714\) to \(0.0822\).

In practice, the CI is usually interpreted as:

There is a \(95\)% chance that the population mean level of cadmium in peanuts is between \(0.0714\) to \(0.0822\).

This is not strictly correct, but is commonly-used, and sufficient for our use.

26.4 About statistical validity

When constructing confidence intervals, statistical validity conditions must be true to ensure the mathematics behind the calculations are sound. For instance, if the sampling distribution has an approximate normal distribution, the statistical validity conditions ensure that the approximation is sufficiently accurate for the \(68\)--\(95\)--\(99.7\) rule to apply7. If these conditions are not met, the sampling distribution may not be close to an approximate normal distribution, so the \(68\)--\(95\)--\(99.7\) rule (on which the CI is based) may not be appropriate, and the CI itself may be inappropriate. Of course, if the statistical validity conditions are close to be satisfied, then the resulting confidence interval will still be reasonably useful.

In addition to the statistical validity condition, the internal validity and external validity of the study should be discussed (Fig. 26.1).

Four types of validities for studies.

FIGURE 26.1: Four types of validities for studies.

In addition, CIs also require that the sample size is less than about \(10\)% of the population size; this is almost always the case.

26.5 Quick revision exercises

Are the following statements true or false?

  1. True or false: CIs always have \(95\)% confidence.
  2. True or false: Statistical validity concern generalisability of the results.
  3. True or false: CIs always include the value of the population parameter.
  4. True or false: All other things being equal, a \(95\)% CI is wider than a \(90\)% CI.
  5. The 'multiplier times the standard error' is called the margin of error.
  6. We are fairly sure (but not certain) that the CI includes the value of the statistic.

26.6 Exercises

Selected answers are available in App. E.

Exercise 26.1 A researcher was computing a \(95\)% CI for a single proportion to estimate the proportion of trees with apple scab (Hirst and Stedman 1962), and found \(\hat{p} = 0.314\) and \(\text{s.e.}(\hat{p}) = 0.091\). What is wrong with the following conclusion?

An approximate \(95\)% CI for the sample proportion is between \(0.223\) and \(0.405\).

Exercise 26.2 A researcher was computing a \(95\)% CI for a single proportion to estimate the proportion of trees with apple scab (Hirst and Stedman 1962), and found that \(\hat{p} = 0.314\) and \(\text{s.e.}(\hat{p}) = 0.091\). What would be wrong with the following conclusion?

This CI means we are \(95\)% confident that between \(22.3\) and \(40.5\) trees are infected with apple scab.

Exercise 26.3 A study of sodium intake in Thailand found the \(95\)% CI for the mean daily sodium intake for subjects with a secondary school education was \(3565\) to \(3903\) mg. What would be wrong with the following conclusion?

This CI means that approximately \(95\)% of the subjects had a daily sodium intake between \(3565\) to \(3903\) mg.

Exercise 26.4 A study of sodium intake in Thailand found the \(95\)% CI for the mean daily sodium intake for subjects with a secondary school education was \(3565\) to \(3903\) mg. What would be wrong with the following conclusion?

A $95% CI for the sample mean daily sodium intake is between \(3565\) to \(3903\) mg.