23.9 Statistical validity conditions: Mean differences

As with any inferential procedure, these results apply under certain conditions. The conditions under which the CI is statistically valid for paired data are similar to those for one sample mean, rephrased for differences.

The CI computed above is statistically valid if one of these conditions is true:

The sample size of differences is at least 25; or
The sample size of differences is smaller than 25, and the population of differences has an approximate normal distribution.

The sample size of 25 is a rough figure here, and some books give other values (such as 30). This condition ensures that the distribution of the sample means has an approximate normal distribution so that the 68–95–99.7 rule is used. Provided the sample size is larger than about 25, this will be approximately true even if the distribution of the individuals in the population does not have a normal distribution. That is, when $n > 25$ the sample means generally have an approximate normal distribution, even if the data themselves don’t have a normal distribution.

In addition to the statistical validity condition, the CI will be

internally valid if the study was well designed; and
externally valid if the sample is a simple random sample and is internally valid.

Example 23.2 (Statistical validity) For the insulation data, the sample size is small, so we require that the differences in the population follow a normal distribution. We don’t know if they do (the data, graphed in Fig. 23.1, don’t seems to identify any obvious doubts). So the CI is possibly statistically valid, but we aren’t sure.

In this case then, the results may not be valid; that is, the CI limits that we calculated will be approximately correct only. (This doesn’t mean the CI is useless!)