8 Sampling distributions

On completion of this module, students should be able to:

  • use appropriate techniques to determine the sampling distributions of \(t\), \(F\), \(\chi^2\) distribution (related to the normal distribution).
  • apply the concepts of the Central Limit Theorem in appropriate circumstances.
  • use the Central Limit Theorem to approximate binomial probabilities by normal probabilities in appropriate circumstances.

8.1 Introduction

This chapter introduces the application of distribution theory to the discipline of statistics. Recall: Distribution theory is about describing random variables using probability, and statistics is about data collection and extracting information from data. A simple example demonstrates how these two things are related.

Example 8.1 (Distribution of sample mean) A simple random sample (SRS) is taken from a population. The main interest is in the mean of the sample.

Typically, a population will be very large. However, for demonstration purposes suppose the population contains only \(N = 5\) elements:

\[ 2, 3, 3, 6, 8. \] The mean of this population is \(4.4\). Assume that the sample is of size \(n = 2\), and sampling is without replacement.

Simple random sampling refers to all possible samples of size \(n\) being equally likely. Since there are \(\binom{N}{n}\) samples of size \(n\) when sampling without replacement from \(N\) objects, \(\binom{5}{2} = 10\) equally-likely possible samples are possible in this example. For example, the sample \(\{2, 3\}\) has a probability of 1/10 of being selected, and the mean of this sample is 2.5.

All 10 possible samples of size \(n = 2\) can be listed, with the mean of each sample (Table 8.1). The sample mean varies from sample to sample. The mean observed for any one sample depends on which sample is taken.

From this table, the distribution of the sample mean itself can be determined (Table 8.2). This distribution is known as the sampling distribution of the sample mean, recognition that the distribution is based on sampling data.

The mean and variance of the sampling distribution are \(\mu_{\bar{X}} = 4.4\) and \(\sigma^2_{\bar{X}} = 1.89\).

This example can be repeated, assuming sampling randomly with replacement, when 25 equally-likely samples possible and possible. The mean and variance of the sampling distribution in this case are \(\mu_{\bar{X}} = 4.4\) and \(\sigma^2_{\bar{X}} = 2.52\) respectively.

Table 8.1: All possible samples of size \(n = 2\) from the population
Sample Mean Probability
2, 3 2.5 1/10
2, 3 2.5 1/10
2, 6 4.0 1/10
2, 8 5.0 1/10
3, 3 3.0 1/10
3, 6 4.5 1/10
3, 8 5.5 1/10
3, 6 4.5 1/10
3, 8 5.5 1/10
6, 8 7.0 1/10
Table 8.2: Sampling distribution of the mean
Mean 2.5 3.0 4.0 4.5 5.0 5.5 7.0
Probability 0.2 0.1 0.1 0.2 0.1 0.2 0.1

In principle, the distribution of any statistic could be found (for a mean, median, variance, range, etc.) using the approach in Example 8.1. In practice, populations are large and the number of possible samples huge, infinite, or the exact elements of the population difficult or impossible to list.

In terms of distribution theory, Table 8.2 is just the distribution of a random variable, since a sample mean is a random variable. Statistically, this distribution describes what to expect for the sample mean when we randomly sample from a population.

Such knowledge is essential in extracting information from samples. Observe that the mean of the sampling distribution in the example is the same as the mean of the population itself: \(\mu = (2 + 3 + 3 + 6 + 8 )/5 = 4.4\). Also, the variance of the sampling distribution is smaller than the variance of the population (\(\sigma^2 = 5.04\)). The exact relationship between \(\sigma^2_{\bar{X}}\) and \(\sigma^2\) is expanded on in Theorem 8.1.

8.2 Sampling distributions

Definition 8.1 (Statistic) A statistic is a real-valued function of the observations in a sample.

If \(X_1, X_2, \dots, X_n\) represents a numerical sample of size \(n\) from some population, then \(T = g(X_1, X_2, \dots, X_n)\) represents a statistic. Examples include the sample mean, sample standard deviation, sample variance and sample proportion. The individual observations in a sample are also statistics. Hence, for example, the minimum value in a sample (usually denoted \(\min\) or \(X_{[1]}\)), the maximum value, and the range \(\max - \min\) are statistics. Furthermore, the first observation in a sample \(X_1\) is a statistic, or even just the number 2 (or any other number), which are a degenerate function of the observations.

A random variable is a numeric variable which is fully determined by the values of all, some or even no members of the sample.

Example 8.2 (Sampling distribution of a statistic) The sampling distribution of a statistic is the theoretical probability distribution of the statistic associated with all possible samples of a particular size drawn from a particular population.

Notice that this definition is not confined to random samples. However, in practically all applications the sample involved is assumed to be random in some sense. Random sampling imposes a probability structure on the possible values of the statistic, allowing the sampling distribution to be defined.

We use \(\bar{X}\) to represent the mean of a sample whose value is as yet unknown (and depends on which sample is selected). The symbol \(\overline{x}\) is used to denoted the sample mean, computed from the data found in a particular sample.

8.2.1 Random sampling

For our purposes a random sample is defined as follows:

Definition 8.2 (Random sample) The set of random variables \(X_1, X_2,\dots,X_n\) is said to be a random sample of size \(n\) from a population with df \(F_X(x)\) if the \(X_i\) are identically and independently distributed (iid) with df \(F_X(x)\).

The definition uses the distribution function to accommodate both discrete and continuous random variables.

This definition of a random sample is standard in theoretical statistics. In statistical practice, however, many sampling designs are called “random”, such as simple random sampling, stratified sampling, and systematic sampling. These sampling methods produce ‘random samples’, even though the samples produced don’t necessarily satisfy Definition 8.2. Usually the context makes it clear whether the strict or loose meaning of ‘random sample’ is intended.

When a sample of size \(n\) is assumed to be chosen ‘at random’ without replacement from a population of size \(N\), this sample does not constitute a random sample in the sense of Definition 8.2. Even though each member of the sample has the same distribution as the population, the members of the sample are not independent. For example, consider the second element \(X_2\) of the samples in Example 8.1. Here, \(\Pr(X_2 = 2) = 0.2\) but that \(\Pr(X_2 = 2 \mid X_1 = 2) = 0\). Therefore \(\Pr(X_2 = 2 \mid X_1 = 2) \ne\Pr(X_2 = 2)\) and so \(X_2\) and \(X_1\) are dependent.

Sampling at random without replacement approximates a random sample if the sample size \(n\) is small compared to the population size \(N\). In this case, the impact of removing some members of the population has minimal impact on the distribution of the population remaining, and the observations will be approximately independent. Populations sizes are commonly much larger than sample sizes.

In Example 8.1, if the sampling is performed with replacement then \(\Pr(X_2 = 2 \mid X_1 = 2) = \Pr(X_2 = 2) = \Pr(X_1 = 2)\). Similar statements can be made for all sample members and all values of the population. In this case, the sample is a random sample according to Definition 8.2.

Sampling at random with replacement gives rise to a random sample, and sampling at random without replacement approximates a random sample provided \(n << N\). Sampling without replacement when the sample size is not much smaller than the population size does not give rise to a random sample. This situation is often referred to as sampling from a finite population.

Sampling at random with replacement is also known as repeated sampling. A typical example is a sequence of independent trials (Bernoulli or otherwise) such as occur when a coin or die is tossed repeatedly. Each trial involves random sampling from a population, which in the case of a coin comprises H and T with equal probability.

Sampling from a continuous distribution is really a theoretical concept in which the population is assumed uncountably infinite. Consequently, sampling at random in this situation is assumed to produce a random sample.

8.2.2 The sampling distribution of the mean

Some powerful results exist describing the sampling distribution of the mean of a random sample.

Theorem 8.1 (Sampling distribution of the mean) If \(X_1, X_2, \dots, X_n\) is a random sample of size \(n\) from a population with mean \(\mu\) and variance \(\sigma^2\), then the sample mean \(\bar{X}\) has a sampling distribution with mean \(\mu\) and variance \(\sigma^2 / n\).

Proof. This is a application of Corollary 6.1 with \(a_i = 1/n\).

Theorem 8.10 applies to a random sample as defined in Sect. 8.2.1. The remarkable feature of Theorem 8.1 is that the distribution of \(X\) is not inmportant.

In Theorem 8.1, the distribution of the population from which the sample is drawn is irrelevant.

Theorem 8.1 applies to samples randomly selected with replacement from a finite population as described in 8.2. In the case where sample are taken without replacement, \(\mu_{\bar{X}}\) still equals \(\mu\) (as this result does not depend on the observations being independent), but \[\begin{equation} \text{var}(\bar{X}) = \frac{\sigma^2}{n}\frac{(N - n)}{(N - 1)} \end{equation}\] which is smaller than \(\sigma^2/n\) by the factor \((N - n)/(N - 1)\). This means that the standard deviation of \(\bar{X}\) is smaller than \(\sigma/\sqrt{n}\) by a factor of \(\sqrt{(N - n)/(N - 1)}\). (This factor was also seen in Exercise 4.19.)

Definition 8.3 (Finite Population Correction (FPC)) The finite population correction (FPC) factor is

\[ \sqrt{ \frac{N - n}{N - 1} }. \]

Example 8.3 (CLT when sampling without replacement) In Example 8.1, \(\mu = 4.4\) and \(\sigma^2 = 5.04\) for the population. The FPC is

\[ \text{FPC} = \sqrt{ \frac{5 - 2}{5 - 1} } = 0.8660\dots. \] If sampling is done without replacement, the sampling distribution of the mean has mean \(\mu_{\bar{X}} = \mu = 4.4\) and variance \(\text{var}(\bar{X}) = 1.89 = \frac{5.04}{2} \times \text{FPC}^2\), in agreement with the note following Theorem 8.10.

If sampling is done with replacement, then \(\mu_{\bar{X}} = \mu = 4.4\) and \(\text{var}(\bar{X}) = 2.52 = 5.04/2\), agreement with the theorem.

Theorem 8.10 describes the location and spread of the sampling distribution of the mean, but not the shape of the sampling distribution. Remarkably, the shape of the sampling distribution only depends on the shape of the population distribution when the sample is small (Sect. 8.4).

8.3 Sampling distributions related to the normal distribution

8.3.1 General results

The normal distribution is used to model many naturally occurring phenomena. Consequently, it is useful for studying the sampling distributions of statistics of random samples from normal populations.

First, we present a useful result.

Theorem 8.2 (Linear combinations) Let \(X_1, X_2, \dots, X_n\) be a set of independent random variables where \(X_i\sim N(\mu_i,\sigma^2_i)\). Define the linear combination \(Y\) as

\[ Y = a_1 X_1 + a_2 X_2 + \dots + a_nX_n. \] Then \(Y \sim N(\Sigma a_i\mu_i, \Sigma a^2_i\sigma^2_i)\).

Proof. In Theorem 5.2, the mgf of the random variable \(X_i\) was shown to be

\[ M_{X_i} (t) = \exp\{\mu_i t + \frac{1}{2} \sigma^2_i t^2\}\quad\text{for $i = 1, 2, \dots, n$}. \] So, for a constant \(a_i\) (using Theorem 3.1 with \(\beta = 0\), \(\alpha = a_i\)):

\[\begin{align*} M_{a_i X_i}(t) &= M_{X_i}(a_it)\\ &= \exp\{\mu_i a_i t + \frac{1}{2} \sigma^2_i a^2_i t^2\}. \end{align*}\] Since the mgf of a sum of independent random variables is equal to the product of their mgfs, then

\[\begin{align*} M_Y(t) &= \prod^n_{i = 1} \exp\left( \mu_i a_i t + \frac{1}{2} \sigma^2_i a^2_i t^2 \right)\\ &= \exp\left( t\Sigma a_i\mu_i +\frac{1}{2} t^2 \Sigma a^2_i \sigma^2_i \right). \end{align*}\] This is the mgf of a normal random variable with mean \(\text{E}(Y) = \Sigma a_i\mu_i\) and variance \(\text{var}(Y) = \Sigma a^2_i\sigma^2_i\).

The sampling distribution of the sum and mean of a random sample from a normal population follows directly from Theorem 8.2.

Corollary 8.1 (Sum and mean of a random sample) Let \(X_1, X_2, \dots, X_n\) be a random sample of size \(n\) from \(N(\mu,\sigma^2)\). Define the sum \(S\) and mean \(\bar{X}\) respectively as

\[\begin{align*} S &= X_1 + X_2 + \dots + X_n\\ \bar{X} &= (X_1+X_2 + \dots + X_n)/n \end{align*}\] Then \(S\sim N(n\mu, n\sigma^2)\) and \(\bar{X}\sim N(\mu, \sigma^2/n)\).

Proof. Exercise.

Corollary 8.1 relies only on the population from which the sample is drawn having a normal distribution, and on the properties of expectation. It is the basis for inference about the population mean of a normal distribution with known variance.

Example 8.4 (Sums of rvs) Suppose envelopes are collected into packs of 25 by weighing them. The weight of an envelope is distributed normally with mean 3g and standard deviation 0.6g. Any weighed pack of envelopes is declared as containing 25 if it weighs between 70 and 80g. What is the probability that a pile of 25 will not be counted as such?

Let random variable \(X_i\) be the weight of the \(i\)th envelope; then \(X_i \sim N(3, 0.36)\). Let \(Y = X_1 + X_2 + \dots + X_{25}\); then \(Y \sim N(75, 9)\). We require \(1 - \Pr(70 < Y < 80)\); see Fig. 8.1. Since the normal distribution is symmetric, proceed:

\[\begin{align*} 1 - \Pr(70 < Y < 80) &= 2\times \Pr\left( Z < \frac{70 - 75}{\sqrt{9}} \right)\\ &= 2\times \Phi(-1.667) \\ &= 0.10. \end{align*}\] In R:

2 * ( pnorm( -1.667 ) )
#> [1] 0.0955144
The envelope question. Left panel: The distribution of the weight of envelopes; right panel: The distribution of the weight of packs of 25 envelopes.

Figure 8.1: The envelope question. Left panel: The distribution of the weight of envelopes; right panel: The distribution of the weight of packs of 25 envelopes.

Example 8.5 (CLT) The IQ’s for a large population of 10 year-old boys (assumed normally distributed) were determined and found to have a mean of 110 and a variance of 144. How large a sample would have to be taken in order to have a probability of 0.9 that the mean IQ of the sample would not differ from the expected value 110 by more than 5?

Let \(X_i\) be the IQ of the \(i\)th boy; then \(X_i \sim N(110, 144)\). Consider a sample of size \(n\) and let \(\bar{X} = \sum^n_{i = 1}X_i/n\); then \(\bar{X}\sim N(110, 144/n)\). Then

\[ \Pr(|\bar{X} - 110|\leq 5) = 0.90. \] That is

\[ \Pr\left(\frac{|\bar{X} - 110|\sqrt{n}}{12} \leq \frac{5\sqrt{n}}{12}\right) = 0.90 \] hence

\[ \Pr(Z \leq 5\sqrt{n} /12) = 0.90. \] Using R:

z <- qnorm(0.95)
z
#> [1] 1.644854

n <- (z * 12 / 5) ^ 2
n
#> [1] 15.58393

The smallest size sample, then, would be a sample of \(n = 16\).

Example 8.6 (CLT) A certain product involves a plunger fitting into a cylindrical tube. The diameter of the plunger can be considered a normal random variable with mean 2.1cm and standard deviation \(0.1\)cm. The inside diameter of the cylindrical tube is a normal random variable with mean 2.3cm and standard deviation \(0.05\)cm. For a plunger and tube chosen randomly from a day’s production run, find the probability that the plunger will not fit into the cylinder.

Let \(X\) and \(Y\) be the diameter of the plunger and cylinder respectively. Then \(X\sim N\big(2.1, (0.1)^2\big)\) and \(Y\sim N\big( 2.3, (0.05)^2\big)\), and we seek \(\Pr(Y < X)\).

The distribution of \(Y - X\) is \(N(2.3 - 2.1, 0.0025 + 0.01)\) so that,

\[\begin{align*} \Pr(Y - X < 0) &= \Pr\left(Z <\frac{0 - 0.2}{\sqrt{0.0125}}\right) \quad\text{where $Z\sim N(0,1)$}\\ &= \Pr(Z < -1.78)\\ &= 0.0375. \end{align*}\] In R:

pnorm( -1.78 )
#> [1] 0.03753798

8.3.2 The \(\chi^2\) distribution

Having studied the sampling distribution of the mean, we now study the sampling distribution of the variance in a normal population. First, a useful result which is a variation of Theorem 7.4.

Theorem 8.3 (Chi-square distribution) Let \(X_1, X_2, \dots, X_n\) be a random sample of size \(n\) from \(N(\mu, \sigma^2)\). Then \(\sum_{i = 1}^n \left(\displaystyle{\frac{X_i - \mu}{\sigma}}\right)^2\) has a chi-square distribution with \(n\) degrees of freedom.

Proof. If \(Z_i = \displaystyle{\frac{X_i - \mu}{\sigma}}\), then

\[ \sum_{i = 1}^n \left(\frac{X_i - \mu}{\sigma}\right)^2 = \sum_{i = 1}^n Z_i^2. \] But \(Z_i\) is the standardised version of \(X_i\) and has a standard normal \(N(0, 1)\) distribution. Also, the random variables \(Z_i\) are independent because the \(X_i\) are independent (\(i = 1, 2, \dots, n\)). The required result follows directly from Theorem 7.4.

The sample variance is given by

\[\begin{equation} S^2 = \frac{1}{n - 1}\sum_{i = 1}^n (X_i - \bar{X})^2; \tag{8.1} \end{equation}\] the sampling distribution of the sample variance follows.

Theorem 8.4 (Distribution of the sample variance) Let \(X_1, X_2, \dots, X_n\) be a random sample of size \(n\) from \(N(\mu, \sigma^2)\). Then

\[ \frac{(n - 1)S^2}{\sigma^2} \sim \chi^2(n - 1). \]

Proof. Only a partial proof is provided. Using (8.1),

\[ \frac{(n - 1)S^2}{\sigma^2} = \sum_{i = 1}^n \left(\frac{X_i - \bar{X}}{\sigma}\right)^2, \] which looks much like the conditions for Theorem 8.3. The difference is that \(\bar{X}\) replaces \(\mu\). This difference is important: \(\mu\) is a fixed value that does not vary, whereas \(\bar{X}\) is a random variable.

Writing

\[\begin{align*} \sum_{i = 1}^n \left( \frac{X_i - \bar{X}}{\sigma} \right)^2 &= \sum_{i = 1}^n \left( \frac{X_i - \bar{X} + \mu - \mu}{\sigma} \right)^2\\ &= \sum_{i = 1}^n \left( \frac{X_i - \mu}{\sigma} \right)^2 - n\frac{(\bar{X} - \mu)^2}{\sigma^2}. \end{align*}\] (Check the algebra!) Rewrite using \(T = U + V\), where

\[\begin{align*} T &= \sum_{i = 1}^n \left( \frac{X_i - \mu}{\sigma} \right)^2,\\ U &= n\frac{(\bar{X} - \mu)^2}{\sigma^2}, \quad\text{and}\\ V &= \sum_{i = 1}^n \left(\frac{X_i - \bar{X}}{\sigma}\right)^2. \end{align*}\] Then see that:

  • \(T \sim \chi^2(n)\) by Theorem 8.3; and
  • \(V \sim \chi^2(1)\) because \(V\) is the square of a standard normal random variable, since \(\bar{X}\sim N(\mu, \sigma^2/n)\).

Therefore the mgf of \(T\) is \((1 - 2t)^{-n}\), and the mgf of \(V\) is \((1 - 2t)^{-1}\). So, provided \(T\) and \(V\) are independent, the mgf of \(T\) is the product of the mgfs of \(U\) and \(V\). The mgf of \(U\) is therefore

\[ M_U(t) = \frac{M_T(t)}{M_V(t)} = (1 - 2t)^{n - 1} \] which is the mgf of the chi-square distribution with \(n - 1\) df. This is a partial proof, because the independence of \(U\) and \(V\) has not been shown.

Example 8.7 (Distribution of sample variance) A random sample of size \(n = 10\) is selected from the \(N(20, 5)\) distribution. What is the probability that the variance of this sample exceeds 10?

By Theorem 8.4,

\[ \frac{(n - 1)S^2}{\sigma^2} = \frac{9 S^2}{5} \sim \chi^2(9). \] Therefore

\[ \Pr(S^2 > 10) = \Pr\left( \frac{9S^2}{5} > \frac{9\times10}{5} \right) = \Pr(X^2 > 18), \] where \(X^2 \sim \chi^2(9)\). Using R, \(\Pr(S^2 > 10) = 0.03517\):

1 - pchisq(18, df = 9)
#> [1] 0.03517354
pchisq(18, df = 9, lower.tail = FALSE)   # Alternatively
#> [1] 0.03517354

8.3.3 The \(t\) distribution

The basis of the independence assumed in the proof of Theorem 8.4 is the surprising result that the mean and variance of a random sample from a normal population are independent random variables.

Theorem 8.5 (Sample mean and variance: Independent) Let \(X_1, X_2,\dots, X_n\) be a random sample of size \(n\) from \(N(\mu, \sigma^2)\). Then the sample mean

\[ \bar{X} = \frac1n\sum_{i=1}^n X_i \] and sample variance

\[ S^2 = \frac{1}{n - 1}\sum_{i=1}^n (X_i - \bar{X})^2 \] are independent.

Proof. This proof is not given.

Another distribution that derives from sampling a normal population is the \(t\)-distribution.

Definition 8.4 (T-Distribution) A continuous random variable \(X\) with probability density function

\[\begin{equation} f_X(x) = \frac{\Gamma((\nu+1)/2)}{\sqrt{\pi \nu}\,\Gamma(\nu/2)} \left(1 + \frac{x^2}{\nu}\right)^{-(\nu + 1)/2} \quad\text{for all $x$} \end{equation}\] is said to have a \(t\) distribution with parameter \(\nu > 0\). The parameter \(\nu\) is called the degrees of freedom. We write \(X \sim t(\nu)\) (or \(t_\nu\)).

The pdf of the \(t\) distribution is very similar to the standard normal distribution (Fig. 8.2), in that it is bell-shaped and symmetric about zero. However, the variance is greater than one and is dependent on \(\nu\).

The four R functions for working with the \(t\)-distribution have the form [dpqr]t(df), where df is the degrees of freedom (see Appendix B).

Some $t$-distributions (with normal distributions in dashed lines), with mean 0 and variance 1

Figure 8.2: Some \(t\)-distributions (with normal distributions in dashed lines), with mean 0 and variance 1

Theorem 8.6 (Properties of the $T$-distribution) If \(X\sim t(\nu)\) then

  • For \(\nu > 1\), \(\text{E}(X) = 0\).
  • For \(\nu > 2\), \(\text{var}(X) = \displaystyle{\frac{\nu}{\nu - 2}}\).
  • The mgf does not exist because only the first \(\nu - 1\) moments exist.

Proof. This proof is not given.

Although we won’t prove it, as \(\nu\to\infty\) the \(t\) distribution converges to the standard normal (which can be seen in Fig. 8.2). In addition, from Theorem 8.6, we see that as \(\nu \to \infty\), \(\text{var}(X) \to 1\) as for the normal distribution.

The usefulness of the \(t\)-distribution derives from the following result.

Theorem 8.7 ($T$-scores) Let \(X_1, X_2, \dots, X_n\) be a random sample of size \(n\) from \(N(\mu, \sigma^2)\). Then the random variable

\[ T = \frac{\bar{X} - \mu}{S/\sqrt{n}} \] follows a \(t(n - 1)\) distribution where \(\bar{X}\) is the sample variance and \(S^2\) is the sample variance as defined in (8.1).

Proof. We give a partial proof only. The statistic \(T\) can be re-expressed as

\[\begin{align*} T &= \frac{\bar{X} - \mu}{S/\sqrt{n}}\\ &= \left(\frac{\bar{X} - \mu}{\sigma/\sqrt{n}}\right) \frac{1}{\sqrt{\frac{(n - 1)S^2}{\sigma^2}/(n - 1)}}\\ &= \frac{Z}{Y/(n - 1)}, \end{align*}\] where \(Z\) is a \(N(0, 1)\) variable and \(Y\) is a chi-square variable with \((n - 1)\) df.

Notice that \(T\) represents a standardised version of the sample mean. Because of this, the \(T\) statistic is important in statistical inference. You probably have used in is hypothesis testing, for example.

Example 8.8 (Calculating $T$) A random sample \(21, 18, 16, 24, 16\) is drawn from a normal population with mean of 20.

  1. What is the value of \(T\) for this sample?
  2. In random samples from this population, what is the probability that \(T\) is less than the value found above?

From the sample: \(\overline{x} = 19.0\) and \(s^2 = 12.0\). Therefore \(t = \frac{19.0 - 20}{\sqrt{12.0/5}} = -0.645\). (Notice lower-case symbols are used for specific values of statistics, and upper-case symbols for the random variables.)

Interest here is in \(\Pr(T < -0.645)\) where \(T\sim t(4)\). From R, the answer is 0.277:

pt(-0.645, df = 4)
#> [1] 0.2770289

8.3.4 The \(F\)-distribution

Definition 8.5 introduces the \(F\)-distribution, which describes the ratio of two sample variances from normal populations. The \(F\)-distribution is used in inferences comparing two sample variances. This distribution is also used in analysis of variance, a common technique used to test the equality of several means.

Definition 8.5 ($F$-distribution) A continuous random variable \(X\) with probability density function

\[\begin{equation} f_X(x) = \frac{\Gamma\big( (\nu_1 + \nu_2)/2 \big) \nu_1^{\nu_1/2} \nu_2^{\nu_2/2} x^{(\nu_1/2) - 1}} {\Gamma(\nu_1/2)\Gamma(\nu_2/2)(\nu_1 x + \nu_2)^{(\nu_1 + \nu_2)/2}}\quad\text{for $x > 0$} \end{equation}\] is said to have an \(F\) distribution with parameters \(\nu_1 > 0\) and \(\nu_2 > 0\). The parameters are known respectively as the numerator and denominator degrees of freedom. We write \(X \sim F(\nu_1,\nu_2)\).

Some \(F\)-distributions are shown in Fig. 8.3. The basic properties of the \(F\) distribution are as follows.

Theorem 8.8 (Properties of $F$-distribution) If \(X\sim F(\nu_1, \nu_2)\) then

  • For \(\nu_2 > 2\), \(\text{E}(F) = \displaystyle{\frac{\nu_2}{\nu_2-2}}\).
  • For \(\nu_2 > 4\), \(\text{var}(F) = \displaystyle{\frac{2\nu_2^2(\nu_1 + \nu_2-2)} {\nu_1(\nu_2 - 2)^2(\nu_2 - 4)}}\).
  • The mgf does not exist.

Proof. Not covered.

Some $F$-distributions

Figure 8.3: Some \(F\)-distributions

Theorem 8.9 ($F$-distribution) Let \(X_1, X_2, \dots, X_{n_1}\) be a random sample of size \(n_1\) from \(N(\mu_1, \sigma_1^2)\) and \(Y_1, Y_2,\dots, Y_{n_2}\) be an independent random sample of size \(n_2\) from \(N(\mu_2, \sigma_2^2)\). Then the random variable

\[ F = \frac{S_1^2/\sigma_1^2}{S_2^2/\sigma_2^2} \] follows a \(F(n_1 - 1, n_2 - 1)\) distribution.

Proof. We give a partial proof only. The \(F\) statistic can be rewritten as

\[ F = \frac{S_1^2/\sigma_1^2}{S_2^2/\sigma_2^2} = \frac{U_1/\nu_1}{U_2/\nu_2} \] where

\[\begin{align*} U_1 &= \displaystyle{\frac{(n_1 - 1)S_1^2}{\sigma_1^2}} \nu_1 &&= n_1 - 1, \\ U_2 &= \displaystyle{\frac{(n_2 - 1)S_2^2}{\sigma_2^2}}, \nu_2 &&= n_2 - 1. \end{align*}\] \(U_1\) and \(U_2\) have chi-square distributions. The pdf of \(\frac{U_1/\nu_1}{U_2/\nu_2}\) and the rest of the proof is not given.

The four R functions for working with the \(F\)-distribution have the form [dpqr]f(df1, df2), where df1\({} = \nu_1\) and df2\({} = \nu2\) (see Appendix B).

Example 8.9 ($F$-distribution probabilities) Suppose \(X\sim F(2, 10)\). Find:

  • \(\Pr(X < 1)\)
  • \(x\) such that \(\Pr(X > x) = 0.01\)

Using R:

pf(1, 
   df1 = 2, 
   df2 = 10)
#> [1] 0.5981224
qf(0.99, 
   df1 = 2, 
   df2 = 10)
#> [1] 7.559432

8.4 The Central Limit Theorem

Sampling distributions for various statistics of interest are well-developed when sampling is from a normal distribution (Sect. 8.3). Although these results are important, what if we do not know the distribution of the population from which the random sample is drawn (as is usually the case)?

In Sect. 8.2.2 general results are given describing the mean and variance of the sample mean which hold for any population distribution, but do not say anything about the distribution. The main result is contained in the following theorem, called The Central Limit Theorem (or CLT).

The Central Limit Theorem is one of the most important theorems in statistics.

Theorem 8.10 (Central Limit Theorem (CLT)) Let \(X_1, X_2, \dots, X_n\) be a random sample from a distribution with mean \(\mu\) and variance \(\sigma^2\). Then the random variable

\[ Z_n = \frac{\bar{X} - \mu}{\sigma / \sqrt{n}} \] converges in distribution to a standard normal variable as \(n\to\infty\).

Proof. Let the random variable \(X_i\) \((i = 1, \dots, n)\) have mgf \(M_{X_i}(t)\). Then

\[ M_{X_i}(t) = 1 + \mu'_1t + \mu'_2\frac{t^2}{2!} + \dots . \] But

\[ \bar{X} = \frac{1}{n} X_1 + \dots + \frac{1}{n} X_n \] so

\[ M_{\bar{X}}(t) = \prod^n_{i = 1} M_{X_i/n}(t) = \left[ M_{X_i} (t/n)\right]^n. \] Now \(\displaystyle{Z_n = \frac{\sqrt{n}}{\sigma}\bar{X} - \frac{\sqrt{n}\mu}{\sigma}}\), which is of the form \(Y = aX + b\) so

\[\begin{align*} M_{Z_n}(t) &= e^{-\sqrt{n}\mu t/\sigma}M_{\bar{X}} (\sqrt{n}t/\sigma )\\ &= e^{-\sqrt{n}\mu t / \sigma} \left[ M_{X_i}\left(\frac{\sqrt{n}t}{\sigma n} \right)\right]^n\\ & = e^{-\sqrt{n}\mu t /\sigma}\left[ M_{X_i}(t / \sigma \sqrt{n})\right]^n \end{align*}\] Then

\[\begin{align*} \log M_{Z_n}(t) &= \frac{-\sqrt{n}\mu t}{\sigma} + n\log \left[ 1 + \frac{\mu'_1 t}{\sigma\sqrt{n}} + \frac{\mu'_2 t^2}{2! \, n\sigma^2} + \frac{\mu'_3 t^3}{3! \, n\sqrt{n}\sigma^3} + \dots \right] \\ &= \frac{-\sqrt{n}\mu t}{\sigma} + n\left[ \mu'_1 \frac{t}{\sigma \sqrt{n}} + \frac{\mu'_2 t^2}{2n\sigma^2} + \frac{\mu'_3t^3}{6n\sqrt{n}\sigma^3} + \dots \right]\\ & \quad \text{} - \frac{n}{2} \left[ \frac{\mu'_1t}{\sigma\sqrt{n}} + \dots \right]^2 + \frac{n}{3} \left[\phantom{\frac{n}{3}}\dots\phantom{\frac{n}{3}}\right]^3 - \dots \\ &= \frac{\mu'_2 t^2}{2\sigma^2} - \frac{(\mu'_1)^2 t^2}{2\sigma^2} + \text{ terms in $(1 / \sqrt{n}$ etc}. \end{align*}\] Now,

\[ \lim_{n\to \infty} \log M_{Z_n}(t) = \frac{(\mu'_2- (\mu'_1)^2)}{\sigma^2}\frac{t^2}{2} = \frac{t^2}{2} \] because the terms in \((1/\sqrt{n})\), etc. tend to zero, and \(\mu'_2 - (\mu'_1)^2 = \sigma^2\). Thus \(\displaystyle{ \lim_{n\to\infty} M_{Z_n}(t) = e^{\frac{1}{2} t^2}}\), which is the mgf of a \(N(0, 1)\) random variable. So \(Z_n\) converges in distribution to a standard normal.

This is a partial proof since the concept of convergence in distribution has not been defined, and mgf of \(X_i\) is assumed to exists, which does not need to be the case. However, the arguments used in the proof are powerful and worth understanding.

‘Convergence in distribution’ means that \(Z_n\) approaches normality as \(n\to \infty\). So when \(n\) is ‘large’, \(Z_n\) is expected to approximate a standard normal distribution. Transforming \(Z_n\) back to the sample mean, \(\bar{X}\) can be expected to approximate a \(N(\mu, \sigma^2/n)\) distribution.

In practice, the distribution of \(\bar{X}\) is sufficiently close to that of a normal distribution when \(n\) is larger than about 25 in most situations. However, if the population distribution is severely skewed, larger samples sizes may be necessary for the approximate to be adequate in practice.

More directly, the Central Limit Theorem states that, for any random variable \(X\), the distribution of the sample means will have an:

  • approximate normal distribution,
  • with mean \(\mu\) (the mean of \(X\)), and
  • with a variance of \(\sigma^2/n\) (where \(\sigma^2\) is the variance of \(X\)).

The approximation improves as the sample size \(n\) gets larger.

This applies for any random variable \(X\), whatever its distribution. For many distributions that are not highly skewed, the approximation is reasonable for sample sizes larger than about 20 to 30. If the data comes from a normal distribution, the distribution of the sample mean always has a normal distribution for any \(n\).

Example 8.10 (CLT) A soft-drink vending machine is set so that the amount of drink dispensed is a random variable with a mean of 200mL and a standard deviation of 15mL. What is the probability that the average amount dispensed in a random sample of size 36 is at least 204mL?

Let \(X\) be the amount of drink dispensed in mL. The distribution of \(X\) is not known. However, the mean and standard deviation of \(X\) are given. The distribution of the sample mean (average), \(\bar{X}\), can be approximated by the normal distribution with mean of \(\mu = 200\) and standard error of \(\sigma_{\bar{X}} = \sigma/\sqrt{n} = 15/\sqrt{36} = 15/6\), according to the CLT. That is, \(\bar{X} \sim N(200, (15/6)^2)\). Now

\[ \Pr(\bar{X} \ge 204) \approx P_N\left(Z \ge \frac{204 - 200}{15/6}\right) = \Pr(Z \ge 1.6). \] In R:

1 - pnorm(1.6)
#> [1] 0.05479929

Hence, \(\Pr(\bar{X}\ge 204) \approx 0.0548\), or about \(5.5\)%. (Here, \(P_N(A)\) is used to denote the probability of event \(A\) involving a random variable assumed to be normal in distribution.)

Example 8.11 (Throwing dice) Consider the experiment of throwing a fair die \(n\) times where we observe the sum of the faces showing. For \(n = 12\), find the probability that the sum of the faces is at least 52.

Let the random variable \(X_i\) be the number showing on the \(i\)th throw. Then, define \(Y = X_1 + \dots + X_{12}\). We seek \(\Pr(Y\geq 52)\).

In order to use Theorem 8.10, note that the event ‘\(Y\geq 52\)’ is equivalent to ‘\(\bar{X}\geq 52/12\)’, where \(\bar{X} = Y/12\) is the mean number showing from the 12 tosses.

Since the distribution of each \(X_i\) is rectangular with \(\Pr(X_i = x) = 1/6\) (for \(x = 1, 2, \dots, 6\)), then \(\text{E}(X_i) = 7/2\) and \(\text{var}(X_i) = 35/12\).

It follows that \(\text{E}(\bar{X}) = 7/2\) and \(\text{var}(\bar{X}) = 35/(12^2)\). Then from Theorem 8.10,

\[\begin{align*} \Pr(Y\geq 52) & \simeq \Pr(\bar{X}\geq 52/12)\\ &= P_N \left(Z\geq \frac{52/12 - 7/2}{\sqrt{35/144}}\right)\\ &= 1 - \Phi(1.690)\\ &= 0.0455 \end{align*}\] The probability is approximately 4.5%.

Example 8.11 can also be solved using a generalisation of the Central Limit Theorem. This generalisation indicates why the normal distribution plays such an important role in statistical theory. It states that a large class of random variable’s converge in distribution to the standardized normal.

Theorem 8.11 (Convergence to standard normal distribution) Let \(X_1, X_2, \dots, X_n\) be a sequence of independent random variable’s with \(\text{E}(X_i) = \mu_i\), \(\text{var}(X_i) = \sigma^2_i\). Define \(Y = a_1X_1 + a_2X_2 + \dots + a_nX_n\).

Then under certain general conditions, including \(n\) large, \(Y\) is distributed approximately \(N(\sum_i a_i\mu_i, \sum_i a^2_i\sigma^2_i)\).

Proof. Not given.

Theorem 8.11 makes no assumptions about the distribution of \(X_i\). If the \(X_i\) are normally distributed, then \(Y = \sum_{i = 1}^n a_i X_i\) will have a normal distribution for any \(n\), large or small.

Example 8.12 (CLT (Voltages)) Suppose we have a number of independent noise voltages \(V_i\) (for \(i = 1, 2, \dots, n\)). Let \(V\) be the sum of the voltages, and suppose each \(V_i\) is distributed \(U(0, 10)\). For \(n = 20\), find \(\Pr(V > 105)\).

This is an example of Theorem 8.11 with each \(a_i = 1\). To find \(\Pr(V > 105)\), the distribution of \(V\) must be known.

Since \(\text{E}(V_i) = 5\) and \(\text{var}(V_i) = 100/12\), from Theorem 8.11 \(V\) has an approximate normal distribution with mean \(20\times 5 = 100\) and variance \(20\times 100/12\). That is, \(\displaystyle{\frac{V - 100}{10\sqrt{5/3}}}\) is distributed \(N(0, 1)\) approximately. So

\[ \Pr(V > 105) \approx P_N\left (Z > \frac{105 - 100}{12.91}\right ) = 1 - \Phi(0.387) = 0.352. \] The probability is approximately 35%.

8.5 The normal approximation to the binomial

The normal approximation to the binomial distribution (Sect. 5.3.4 can be seen as an application of the Central Limit Theorem. The essential observation is that a sample proportion is a sample mean. Consider a sequence of independent Bernoulli trials resulting in the random sample \(X_1, X_2, \dots, X_n\) where

\[ X_i = \begin{cases} 0 & \text{if failure};\\ 1 & \text{if success} \end{cases} \] denotes whether or not the \(i\)th trial is a success. Then the sum

\[ Y = \sum_{i = 1}^n X_i \] represents the number of successes in the \(n\) trials and

\[ \bar{X} = \frac{1}{n}\sum_{i = 1}^n X_i = \frac{Y}{n} \] is a sample mean representing the proportion or fraction of trials which are successful. In this context, \(\bar{X}\) is usually denoted by the sample proportion \(\widehat{p}\).

Note that \(\text{E}(X_i) = p\) and \(\text{var}(X_i) = p(1 - p)\). Therefore

\[ \text{E}(Y) = np \quad\text{and}\quad \text{var}(Y) = np(1 - p) \] and

\[ \text{E}(\bar{X}) = p \quad\text{and}\quad \text{var}(\bar{X}) = \frac{p(1 - p)}{n}. \] Theorems 8.10 and 8.11 are applicable to \(\bar{X}\) and \(Y\) respectively. Hence

\[ \bar{X} = \widehat{p} \sim N\left(p, \frac{p(1 - p)}{n}\right)\text{ approximately} \] and

\[ Y = n\widehat{p} \sim N(np, np(1 - p))\text{ approximately}. \]

8.6 Exercises

Exercise 8.1 Consider the FPC.

  1. What happens if the sample size is the same as the population size? Explain.
  2. Show that, for large \(n\), the FPC is approximately \(\sqrt{ 1 - n/N}\).

Exercise 8.2 A random sample of size 81 is taken from a population with mean 128 and standard deviation 6.3. What is the probability that the sample mean will not fall between 126.6 and 129.4?

Exercise 8.3 Let \(Y_1\), \(Y_2\), \(\dots\), \(Y_n\) be \(n\) independent random variables, each with pdf

\[ f_Y(y) = \begin{cases} 3y^2 & \text{for $0\le y \le 1$};\\ 0 & \text{otherwise}. \end{cases} \] Determine the probability that the sample mean will be within one standard deviation of the population mean, using the Central Limit Theorem.

Exercise 8.4 Suppose the weights of eggs in a dozen carton have a weight that is normally distributed with mean \(59\) and variance \(0.7\).

  1. Find the probability that, in a sample of \(20\) cartons, the sample mean weight will exceed \(59.5\) grams.
  2. Find the probability that a sample of twelve eggs will produce a sample variance of greater than \(1\).

Exercise 8.5 In a carton of a dozen eggs, the number broken has a Poisson distribution with mean \(0.2\).

  1. Find the probability that, in a sample of 20 cartons, the sample mean of the number of broken eggs per carton is more than one. (Use the Central Limit Theorem.)
  2. Find the probability that, in any single carton, the probability that more than one egg is broken.

Exercise 8.6 The random variable \(M\) has the following probability density function
\[ f_M(m) = \begin{cases} 3m^2 & \text{for $0 < m < 1$};\\ 0 & \text{otherwise}. \end{cases} \] A random sample of size \(9\) is taken from the distribution, and the sample mean \(\overline{M}\) is computed.

  1. Compute the mean of \(M\).
  2. Compute the variance of \(M\).
  3. State the approximate distribution of \(\overline{M}\) including the parameters.
  4. Compute the probability that the sample mean will be within \(0.1\) of the true mean.

Exercise 8.7 Generate a random sample of size \(n = 9\) from a \(N(10, 36)\) distribution hundreds of times. Obtain the mean and variance of \(\sum X_i\) and \(\overline{X}\) for each sample. Verify they are approximately normally distributed as expected.

Exercise 8.8 Consider the random variable \(A\), defined as
\[ A = \frac{Z}{\sqrt{W/\nu}}, \] where \(Z\) has a standard normal distribution independent of \(W\), which has a \(\chi^2\) distribution with \(\nu\) degrees of freedom. Also, consider the random variable \(B\), defined as
\[ B = \frac{W_1/\nu_1}{W_2/\nu_2}, \] where \(W_1\) and \(W_2\) are independent \(\chi^2\) variables with \(\nu_1\) and \(\nu_2\) degrees of freedom respectively.

  1. Write down (do not derive) the distribution of \(A\), including the parameters of the distribution.
  2. Write down (do not derive) the distribution of \(B\), including the parameters of the distribution.
  3. Deduce that the distribution of \(A^2\) is a special case of the distribution of \(B\), and state the values of the parameters for which this is true.

Exercise 8.9 A manufacturing plant produces 2 tonnes of waste product on a given day, with a standard deviation of 0.2 tonnes per day. Find the probability that, over a 20 day period, the plant produces less than 25 tonnes of waste if daily productions can be assumed independent.

Exercise 8.10 Let \(Y\) be the change in depth of a river from one day to the next measured (in cms) at a specific location. Assume \(Y\) is uniformly distributed for \(y \in [-70, 70]\).

  1. Find the probability that the mean change in depth for a period of 30 days will be greater than 10cms.
  2. Use simulation to estimate the probability above.

Exercise 8.11 The number of deaths per year due to typhoid fever is assumed to have a Poisson distribution with rate \(\lambda = 4.1\) per year.

  1. If deaths from year to year can be assumed to be independent, what is the distribution over a 20 year period?
  2. Find the probability that there will be more than 100 deaths due to typhoid fever in period of 20 years.

Exercise 8.12 The probability that a cell is a lymphocyte is 0.2.

  1. Write down an exact expression for the probability that in a sample containing 150 cells that at least 40 are lymphocytes. Evaluate this expression using R.
  2. Write down an approximate expression for this probability, and evaluate it.

Exercise 8.13 Suppose the probability of a person aged 80+ years dying after receiving influenza vaccine is 0.006. In a sample of 200 persons aged 80+ years:

  1. Write down an exact expression for the probability that more than 5 will die after vaccination for influenza. Evaluate this expression using R.
  2. Write down an approximate expression for this probability and evaluate it.
  3. If 4 persons died in a sample of 200, what conclusion would you make about the probability of dying after vaccination? Justify your answer.

Exercise 8.14 Illustrate the Central Limit Theorem for the uniform distribution on \([-1, 1]\) by simulation and repeated sampling, for various sample sizes.

Exercise 8.15 Illustrate the Central Limit Theorem for the exponential distribution with mean \(1\), by simulation and repeated sampling, for various sample sizes.

Exercise 8.16 Demonstrate the Central Limit Theorem using R:

  1. For the uniform distribution, where \(-1 < x < 1\) (a symmetric distribution).
  2. For the exponential distribution with parameter 1 (a skewed distribution).

Exercise 8.17 The random variable \(M\) has the probability density function

\[ f_M(m) = \begin{cases} 3m^2 & \text{for $0 < m < 1$}\\ 0 & \text{otherwise} \end{cases} \] A random sample of size \(9\) is taken from the distribution, and the sample mean \(\overline{M}\) is computed.

  1. Compute the mean of \(M\).
  2. Compute the variance of \(M\).
  3. State the approximate distribution of \(\overline{M}\) including the parameters of the distribution.
  4. Compute the probability that the sample mean will be within \(0.1\) of the true mean.