# Chapter 2 Introduction to statistical inference

## 2.1 Basic definitions

As seen in Example 1.3, employing frequentist ideas we can state that there is an underlying probability law that drives each of the described random experiments. The interest when studying a random phenomenon lies precisely on this underlying distribution.

Usually, for certain kinds of experiments and random variables, experience provides some information about the underlying distribution \(F\) (identified here with the c.d.f.). In many situations, \(F\) is not exactly known, but we may know the form of \(F\) up to a parameter \(\theta\) that drives its shape. In other words, we may know that \(F\) belongs to the *parametric* family of distributions \(\{F(\,\cdot\, ;\theta): \ \theta\in\Theta\}\) indexed by a certain parameter \(\theta\in\Theta.\)

If our objective is to know more about \(F\) or, equivalently, about its driving parameter \(\theta,\) we can perform independent realizations of the experiment and then process the information in a certain way in order to *infer* knowledge about \(F\) or \(\theta.\) This is the process of performing *inference* about \(F\) or, equivalently, about \(\theta.\) The fact that we perform independent repetitions of the experiment means that, for each repetition of the experiment, we have an associated r.v. with the same distribution \(F.\) This the concept of a *simple random sample*.

**Definition 2.1 (Simple random sample) **A *simple random sample* (s.r.s.) of size \(n\) of a r.v. \(X\) with distribution \(F\) is a collection of r.v.’s \((X_1,X_2,\ldots,X_n)\) that are *independent and identically distributed* (i.i.d.) with distribution \(F.\)

Therefore, a s.r.s. is a *random vector* \(X=(X_1,X_2,\ldots,X_n)\) that is defined over the measurable product space \((\mathbb{R}^n,\mathcal{B}^n).\) Since the r.v.’s are independent, the c.d.f. of the sample (or of the vector \(X\)) is
\[
F_X(x_1,x_2,\ldots,x_n)=F(x_1)F(x_2)\cdots F(x_n).
\]

**Example 2.1 **We analyze next the underlying probability functions that appear in Example 1.3.

Consider the family of p.m.f.’s given by \[ \mathbb{P}(X=1)=\theta, \quad \mathbb{P}(X=0)=1-\theta,\quad \theta\in [0,1]. \] The p.m.f. of experiment a in Example 1.3 belongs to this family, for \(\theta=0.5.\)

From previous experience, it is known that the r.v. that measures the number of events happening in a given time interval has a p.m.f. that belongs to \(\{p(\,\cdot \,;\lambda):\ \lambda\in\mathbb{R}^+\},\) where

\[\begin{align} p(x;\lambda)=\frac{\lambda^x e^{-\lambda}}{x!}, \ x=0,1,2,3,\ldots\tag{2.1} \end{align}\] This is the p.m.f. of the Poisson of parameter \(\lambda\) that was introduced in Exercise 1.12. Replacing \(\lambda=4\) and \(x=0,1,2,3,\ldots\) in (2.1), we obtain the following probabilities:

\(x\) \(0\) \(1\) \(2\) \(3\) \(4\) \(5\) \(6\) \(7\) \(8\) \(\geq 9\) \(p(x;4)\) \(0.018\) \(0.073\) \(0.147\) \(0.195\) \(0.156\) \(0.156\) \(0.104\) \(0.060\) \(0.030\) \(0.021\) Observe that these probabilities are similar to the relative frequencies given in Table 1.2.

From previous experience, it is known that the r.v. that measures the weight of a person has a normal distribution \(\mathcal{N}(\mu,\sigma^2),\) whose p.d.f. is \[ f(x;\mu,\sigma^2)=\frac{1}{\sqrt{2\pi\sigma^2}}\exp\left\{-\frac{(x-\mu)^2}{2\sigma^2}\right\},\ \mu\in\mathbb{R}, \ \sigma^2\in\mathbb{R}^+. \] Indeed, setting \(\mu=39\) and \(\sigma=5,\) the next probabilites for the intervals are obtained:

\(I\) \((-\infty,35]\) \((35,45]\) \((45,55]\) \((55,65]\) \((65,\infty)\) \(\mathbb{P}(x\in I)\) \(0.003\) \(0.209\) \(0.673\) \(0.114\) \(0.001\) Observe the similarity with Table 1.3.

**Definition 2.2 (Statistic) **A *statistic* \(T\) is any measurable function \(T:(\mathbb{R}^n,\mathcal{B}^n)\rightarrow(\mathbb{R}^k,\mathcal{B}^k),\) where \(k\) is the *dimension* of the statistic.

**Example 2.2 **The following are examples of statistics:

- \(T_1(X_1,\ldots,X_n)=\frac{1}{n}\sum_{i=1}^n X_i.\)
- \(T_2(X_1,\ldots,X_n)=\frac{1}{n}\sum_{i=1}^n (X_i-\bar{X})^2.\)
- \(T_3(X_1,\ldots,X_n)=\min\{X_1,\ldots,X_n\}\triangleq X_{(1)}.\)
- \(T_4(X_1,\ldots,X_n)=\max\{X_1,\ldots,X_n\}\triangleq X_{(n)}.\)
- \(T_5(X_1,\ldots,X_n)=\sum_{i=1}^n \log X_i.\)
- \(T_6(X_1,\ldots,X_n)=(X_{(1)},X_{(n)}).\)

All the statistics have dimension \(k=1,\) except \(T_6\) with \(k=2.\)

From the definition, we can see that a statistic is a r.v., or a random vector if \(k>1.\) The distribution induced by the statistic \(T\) is called the *sampling distribution of \(T\)*, since it depends on the distribution of the sample.

**Example 2.3 **A coin is tossed independently three times. Let \(X_i,\) \(i=1,2,3,\) be the r.v. that measures the outcome of the \(i\)-th toss:
\[
X_i=\begin{cases}
1 & \text{if "heads" is the outcome of the $i$-th toss},\\
0 & \text{if "tails" is the outcome of the $i$-th toss}.
\end{cases}.
\]
If we do not know whether the coin is fair (“heads” and “tails” may not be equally likely), then the p.m.f. of \(X_i\) is given by
\[
\mathbb{P}(X_i=1)=\theta,\quad \mathbb{P}(X_i=0)=1-\theta, \quad \theta\in\Theta=[0,1].
\]
The three r.v.’s are independent. Therefore, the probability of the sample \((X_1,X_2,X_3)\) is the product of the individual probabilities of \(X_i,\) that is,
\[
\mathbb{P}(X_1=x_1,X_2=x_2,X_3=x_3)=\mathbb{P}(X_1=x_1)\mathbb{P}(X_2=x_2)\mathbb{P}(X_3=x_3).
\]
The following table collects the values of the p.m.f. of the sample for each value of the sample.

\((x_1,x_2,x_3)\) | \(p(x_1,x_2,x_3)\) |
---|---|

\((1,1,1)\) | \(\theta^3\) |

\((1,1,0)\) | \(\theta^2(1-\theta)\) |

\((1,0,1)\) | \(\theta^2(1-\theta)\) |

\((0,1,1)\) | \(\theta^2(1-\theta)\) |

\((1,0,0)\) | \(\theta(1-\theta)^2\) |

\((0,1,0)\) | \(\theta(1-\theta)^2\) |

\((0,0,1)\) | \(\theta(1-\theta)^2\) |

\((0,0,0)\) | \((1-\theta)^3\) |

We can define the sum statistic as \(T_1(X_1,X_2,X_3)=\sum_{i=1}^3 X_i.\) Now, from the previous table, we can compute the sampling distribution of \(T\) as the p.m.f. given by

\(t_1=T_1(x_1,x_2,x_3)\) | \(p(t_1)\) |
---|---|

\(3\) | \(\theta^3\) |

\(2\) | \(3\theta^2(1-\theta)\) |

\(1\) | \(3\theta(1-\theta)^2\) |

\(0\) | \((1-\theta)^3\) |

Another possible statistic is the sample mean, \(T_2(X_1,X_2,X_3)=\bar X,\) whose sampling distribution is

\(t_2=T_2(x_1,x_2,x_3)\) | \(p(t_2)\) |
---|---|

\(1\) | \(\theta^3\) |

\(2/3\) | \(3\theta^2(1-\theta)\) |

\(1/3\) | \(3\theta(1-\theta)^2\) |

\(0\) | \((1-\theta)^3\) |

**Example 2.4 **Let \((X_1,\ldots,X_n)\) be a s.r.s. of a r.v. with exponential distribution of parameter \(1/\theta,\) that is, with p.d.f.
\[
f(x;\theta)=\left\{\begin{array}{ll}
\theta e^{-\theta x} & \text{if} \ x\geq 0,\\
0 & \text{if} \ x<0.
\end{array}\right.
\]
We will obtain the sampling distribution of the sum statistic \(T(X_1,\ldots,X_n)=\sum_{i=1}^n X_i.\)

First, observe that the exponential distribution of parameter \(1/\theta\) is a particular case of the gamma \(\Gamma(\alpha,\beta)\) distribution (introduced in Example 1.11) that corresponds to \(\alpha=1\) and \(\beta=1/\theta.\) Recall that the density of a \(\Gamma(\alpha,\beta)\) is given by \[ f(x;\alpha,\beta)=\frac{1}{\Gamma(\alpha) \beta^{\alpha}}\, x^{\alpha-1} e^{-x/\beta}, \ 0<x<\infty, \ \alpha>0, \ \beta>0. \] From Exercise 1.17 we know that the sum of \(n\) independent r.v.’s \(\Gamma(\alpha,\beta)\) is a \(\Gamma(n\alpha,\beta).\) Therefore, the distribution of the sum statistic \(T(X_1,\ldots,X_n)=\sum_{i=1}^n X_i\) is a \(\Gamma(n,1/\theta).\)

**Example 2.5 (Sampling distribution of the minimum) **Let \((X_1,\ldots,X_n)\) be a s.r.s. of a r.v. with c.d.f. \(F.\) The sampling distribution of the statistic \(T(X_1,\ldots,X_n)=X_{(1)}\) follows from the c.d.f. of \(T\):
\[\begin{align*}
1-F_{T}(t) &=\mathbb{P}_T(T>t)=\mathbb{P}_{(X_1,\ldots,X_n)}(X_{(1)}> t)=
\mathbb{P}_{(X_1,\ldots,X_n)}(X_1> t,\ldots,X_n> t) \\
&=\prod_{i=1}^n \mathbb{P}_{X_i}(X_i> t)=\prod_{i=1}^n\left[1-F_{X_i}(t)\right]=\left[1-F(t)\right]^n.
\end{align*}\]

## 2.2 Sampling distributions in normal populations

Many r.v.’s arising in biology, sociology, or economy have a normal distribution with mean \(\mu\in\mathbb{R}\) and variance \(\sigma^2\in \mathbb{R}^+.\) This is due to the Central Limit Theorem, a key result in statistical inference that stands that the accumulated effect of a large number of independent r.v.’s behaves approximately as a normal distribution. Because of this, and the tractability of normal variables, in statistical inference it is usually assumed that the distribution of a r.v. belongs to the normal family of distributions \(\{\mathcal{N}(\mu,\sigma^2):\ \mu\in\mathbb{R},\ \sigma^2\in\mathbb{R}^+\},\) where the mean \(\mu\) and the variance \(\sigma^2\) are unknown.

In order to perform inference about \((\mu,\sigma^2),\) a s.r.s. \((X_1,\ldots,X_n)\) of \(\mathcal{N}(\mu,\sigma^2)\) is considered. We can compute several statistics using this sample, but we will pay special attention to the ones whose values tend to be “similar” to the value of the unknown parameters \((\mu,\sigma^2).\) A statistic of this kind is precisely an *estimator*. Different kinds of estimators exist depending on the criterion employed to define the “similarity” between the estimator and the parameter to be estimated. The observed value or realization of an estimator is referred as *estimate*.

The *sample mean* \(\bar{X}\) and *sample variance* \(S^2\) estimators play an important role in statistical inference, since both are “good” estimators of \(\mu\) and \(\sigma^2,\) respectively. As a consequence, it is important to obtain their sampling distributions in order to know their random behaviors. We will do so under the assumption of normal populations.

### 2.2.1 Sampling distribution of the sample mean

**Theorem 2.1 (Distribution of \(\bar{X}\)) **Let \((X_1,\ldots,X_n)\) be a s.r.s. of size \(n\) of a r.v. \(\mathcal{N}(\mu,\sigma^2).\) Then, the *sample mean* \(\bar{X}=\frac{1}{n}\sum_{i=1}^n X_i\) satisfies
\[\begin{align}
\bar{X}\sim
\mathcal{N}\left(\mu,\frac{\sigma^2}{n}\right). \tag{2.2}
\end{align}\]

*Proof* (Proof of Theorem 2.1). The proof is simple, since the sum of normal r.v.’s is another normal. Therefore, it only remains to compute its mean and variance. The mean is directly obtained from the properties of the expectation, without neither requiring the assumption of normality nor independence:
\[
\mathbb{E}[\bar{X}]=\frac{1}{n}\sum_{i=1}^n \mathbb{E}[X_i] =\frac{1}{n}n\mu=\mu.
\]
The variance is obtained by relying on the hypothesis of independence:
\[
\mathbb{V}\mathrm{ar}[\bar{X}]=\frac{1}{n^2}\sum_{i=1}^n \mathbb{V}\mathrm{ar}[X_i]
=\frac{1}{n^2}n\sigma^2=\frac{\sigma^2}{n}.
\]

**Example 2.6 **It is known that the weight of liquid that a machine fills into a bottle follows a normal distribution with *unknown* mean \(\mu\) and standard deviation \(\sigma=1\) (the units are ounces). From the production of the filling machine along one day, it is obtained a s.r.s. of \(n=9\) filled bottles, whose weight is measured in ounces. We want to know what is the probability that the sample mean is closer than \(0.3\) ounces to the real mean \(\mu.\)

If \(X_1,\ldots,X_9\) is the s.r.s. that contains the measurements of the nine bottles, then \(X_i\sim\mathcal{N}(\mu,\sigma^2),\) \(i=1,\ldots,n,\) where \(n=9\) and \(\sigma^2=1.\) Then, by Theorem 2.1, we have that \[ \bar{X}\sim\mathcal{N}\left(\mu,\frac{\sigma^2}{n}\right) \] or, equivalently, \[ Z=\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\sim\mathcal{N}(0,1). \] The desired probability is then

\[\begin{align*} \mathbb{P}(|\bar{X}-\mu|\leq 0.3) &=\mathbb{P}(-0.3\leq \bar X-\mu\leq 0.3)\\ &=\mathbb{P}\left(-\frac{0.3}{\sigma/\sqrt{n}}\leq \frac{\bar X-\mu}{\sigma/\sqrt{n}}\leq \frac{0.3}{\sigma/\sqrt{n}}\right) \\ &=\mathbb{P}(-0.9\leq Z\leq 0.9)\\ &=\mathbb{P}(Z>-0.9)-\mathbb{P}(Z>0.9)\\ &=1-\mathbb{P}(Z>0.9)-\mathbb{P}(Z>0.9) \\ &=1-2\mathbb{P}(Z>0.9)\\ &=1-2(0.1841)=0.6318. \end{align*}\]

The upper-tail probabilities \(\mathbb{P}_Z(Z>k)=1-\mathbb{P}_Z(Z\leq k)\) are given in the \(\mathcal{N}(0,1)\) probability tables. More importantly according to present times, they can be computed rightaway with any software package. For example, in R they are obtained with the `pnorm`

function:

```
# Computation of P(Z > k)
<- 0.9
k 1 - pnorm(k) # 1 - P(Z <= k)
## [1] 0.1840601
pnorm(k, lower.tail = FALSE) # Alternatively
## [1] 0.1840601
```

**Example 2.7 **Consider the situation of Example 2.6. How many observations must be included in the sample so that the difference between \(\bar{X}\) and \(\mu\) is smaller than \(0.3\) ounces with a probability of
\(0.95\)?

The answer is given by the sample size \(n\) that verifies \[ \mathbb{P}(|\bar{X}-\mu|\leq 0.3)=\mathbb{P}(-0.3\leq \bar{X}-\mu\leq 0.3)=0.95 \] or, equivalently, \[ \mathbb{P}(-0.3\sqrt{n}\leq \sqrt{n}(\bar{X}-\mu)\leq 0.3\sqrt{n})=\mathbb{P}(-0.3\sqrt{n}\leq Z\leq 0.3\sqrt{n})=0.95. \]

For a given \(0<\alpha<1,\) we know that the upper \(\alpha/2\)-quantile of a \(Z\sim\mathcal{N}(0,1)\) is such that
\[
\mathbb{P}(-z_{\alpha/2}\leq Z\leq z_{\alpha/2})=1-2\mathbb{P}(Z> z_{\alpha/2})=1-\alpha.
\]
Setting \(\alpha=0.05,\) we can easily compute \(z_{\alpha/2}\cong 1.96\) in R through the `qnorm`

function:

```
<- 0.05
alpha qnorm(1 - alpha / 2) # LOWER (1 - beta)-quantile = UPPER beta-quantile
## [1] 1.959964
qnorm(alpha / 2, lower.tail = FALSE) # Alternatively, lower.tail = FALSE
## [1] 1.959964
# computes the upper quantile and lower.tail = TRUE (the default) computes the
# lower quantile
```

Therefore, we set \(0.3\sqrt{n}=1.96\) and solve for \(n,\) which results in \[ n=\left(\frac{1.96}{0.3}\right)^2=42.68. \] Then, if we take \(n=43,\) we have that \[ \mathbb{P}(|\bar{X}-\mu|\leq 0.3)>0.95. \]

### 2.2.2 Sampling distribution of the sample variance

The *sample variance* is given by
\[
S^2=\frac{1}{n}\sum_{i=1}^n (X_i-\bar{X})^2=\frac{1}{n}\sum_{i=1}^n X_i^2-{\bar{X}}^2.
\]
The *sample quasivariance* will also play an important role in inference. It is defined by simply replacing \(n\) by \(n-1\) in the factor of \(S^2\):
\[
S'^2=\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar
X)^2=\frac{n}{n-1}S^2=\frac{1}{n-1}\sum_{i=1}^n
X_i^2-\frac{n}{n-1}{\bar{X}}^2.
\]
Before establishing the sampling distributions of \(S^2\) and \(S'^2\) we obtain in the first place their expectations. For that aim, we start by decomposing the variability of the sample with respect to its expectation \(\mu\) in the following way:
\[
\sum_{i=1}^n(X_i-\mu)^2=\sum_{i=1}^n(X_i-\bar{X})^2+n(\bar{X}-\mu)^2
\]
Taking expectations, we have
\[
n\sigma^2=n\mathbb{E}[S^2]+n\frac{\sigma^2}{n},
\]
and then, solving for the expectation,

\[\begin{align} \mathbb{E}[S^2]=\frac{(n-1)}{n}\,\sigma^2.\tag{2.3} \end{align}\]

Therefore,

\[\begin{align} \mathbb{E}[S'^2]=\frac{n}{n-1}\mathbb{E}[S^2]=\sigma^2.\tag{2.4} \end{align}\]

Recall that this computation does not employ the assumption that the sample comes from a normal distribution, hence it is a general fact for \(S^2\) and \(S'^2\) irrespective of the underlying distribution. It also shows that \(S^2\) is *not* “pointing” towards \(\sigma^2\) but to a slightly smaller quantity, whereas \(S'^2\) is “pointing” directly to \(\sigma^2.\) This observation is related with the *bias* of an estimator and will be treated in detail in Section 3.1.

In order to compute the sampling distributions of \(S^2\) and \(S'^2,\) it is required to obtain the sampling distribution of the statistic \(\sum_{i=1}^n X_i^2\) when the sample is generated from a \(\mathcal{N}(0,1),\) which will follow a chi-square distribution.

**Definition 2.3 (Chi-square distribution) **A r.v. has *chi-square distribution* with \(\nu\in \mathbb{N}\) *degrees of freedom*, denoted as \(\chi_{\nu}^2,\) if its distribution coincides with the gamma distribution of shape \(\alpha=\nu/2\) and scale \(\beta=2.\) In other words,
\[
\chi_{\nu}^2=\Gamma (\nu/2,2),
\]
with p.d.f. given by
\[
f_{\chi^2_{\nu}}(x)=\frac{1}{\Gamma(\nu/2) 2^{\nu-1}}\, x^{\nu/2-1}
e^{-x/2}, \ x>0, \ \nu\in\mathbb{N}.
\]

The mean and the variance of a chi-square with \(\nu\) degrees of freedom are \[ \mathbb{E}[\chi_{\nu}^2]=\nu, \quad \mathbb{V}\mathrm{ar}[\chi_{\nu}^2]=2\nu. \] We can observe that a chi-square r.v., as any gamma r.v., is always positive. Also, their expectation and variance grow accordingly to the degrees of freedom \(\nu.\) When \(\nu\geq 2,\) the p.d.f. attains its global maximum at \(\nu-2.\) If \(\nu=1\) or \(\nu=2,\) the p.d.f. is monotone decreasing. These facts are illustrated in Figure 2.1.

The next two propositions are key for obtaining the sampling distribution of \(\sum_{i=1}^n X_i^2,\) given in Corollary 2.1.

**Proposition 2.1 **If \(X\sim \mathcal{N}(0,1),\) then \(X^2\sim \chi_1^2.\)

*Proof* (Proof of Proposition 2.1). We compute the c.d.f. of the r.v. \(X^2.\) Since \(X\sim \mathcal{N}(0,1)\) has a symmetric p.d.f., then
\[\begin{align*}
F_{X^2}(y) &=\mathbb{P}_{X^2}(X^2\leq y)=\mathbb{P}\left(-\sqrt{y}\leq X \leq\sqrt{y}\right)=2\mathbb{P}\left(0\leq X \leq \sqrt{y}\right)\\
&=2\int_{0}^{\sqrt{y}}
\frac{1}{\sqrt{2\pi}} e^{x^2/2}\,\mathrm{d}x =\int_{0}^y \frac{1}{2\pi}
e^{-u/2} u^{-1/2} \,\mathrm{d}u\\
&=F_{\Gamma(1/2,2)}(y)=F_{\chi^2_1}(y), \ y>0.
\end{align*}\]

**Proposition 2.2 (Additive property of the chi-square) **If \(X_1\sim \chi_n^2\) and \(X_2\sim \chi_m^2\) are independent, then
\[
X_1+X_2\sim \chi_{n+m}^2.
\]

*Proof* (Proof of Proposition 2.2). The proof follows directly from the additive property of the gamma distributon (see Exercise 1.17): given independent \(X_1\sim \Gamma(\alpha_1,\beta)\) and \(X_2\sim \Gamma(\alpha_2,\beta),\) then
\[
X_1+X_2 \sim \Gamma(\alpha_1+\alpha_2,\beta).
\]
The chi-square distribution is a particular case of the gamma, so the proof follows inmediately.

**Corollary 2.1 **Let \(X_1,\ldots,X_n\) be independent r.v.’s distributed as \(\mathcal{N}(0,1).\) Then,
\[
\sum_{i=1}^n X_i^2\sim \chi_n^2.
\]

The last result is sometimes employed for directly defining the chi-square r.v. with \(\nu\) degrees of freedom as the sum of \(\nu\) independent squared \(\mathcal{N}(0,1)\) r.v.’s. In this way, the degrees of freedom represent the number of terms in the sum.

**Example 2.8 **If \((Z_1,\ldots,Z_6)\) is a s.r.s. of a standard normal, find a number \(b\) such that
\[
\mathbb{P}\left(\sum_{i=1}^6 Z_i^2\leq b\right)=0.95.
\]
We know from Corollary 2.1 that
\[
\sum_{i=1}^6 Z_i^2\sim \chi_6^2.
\]
Then, \(b\cong 12.59\) corresponds to the upper \(\alpha\)-quantile of a \(\chi^2_{\nu},\) denoted as \(\chi^2_{\nu;\alpha}.\) Here, \(\alpha=0.05\) and \(\nu=6.\) The quantiles \(\chi^2_{6;0.05}\) can be computed by either looking into the probability tables of the chi-square or, simpler, by calling the `qchisq`

function in R:

```
<- 0.05
alpha qchisq(1 - alpha, df = 6) # df stands for the degrees of freedom
## [1] 12.59159
qchisq(alpha, df = 6, lower.tail = FALSE) # Alternatively
## [1] 12.59159
```

The final result of this section is the famous Fisher’s Theorem, which delivers the sampling distribution of \(S^2\) and \(S'^2.\)

**Theorem 2.2 (Fisher's Theorem) **If \((X_1,\ldots,X_n)\) is a s.r.s. of a \(\mathcal{N}(\mu,\sigma^2)\) r.v., then \(S^2\) and \(\bar{X}\) are
*independent*, and
\[
\frac{nS^2}{\sigma^2}=\frac{(n-1)S'^2}{\sigma^2}\sim\chi_{n-1}^2.
\]

*Proof* (Proof of Theorem 2.2). We apply Theorem 2.6 given in the Appendix for \(p=1,\) in such a way that
\[
nS^2=\sum_{i=1}^n X_i^2-(\sqrt{n}\bar{X})^2=\sum_{i=1}^n X_i^2-
c_1 X^\prime,
\]
for \(c_1=(1/\sqrt{n},\ldots,1/\sqrt{n})\) and \(X=(X_1,\ldots,X_n).\) Therefore, by such theorem, \(S^2\) is
independent of \(\bar{X},\) and \(\frac{nS^2}{\sigma^2}\sim \chi_{n-1}^2.\)

**Example 2.9 **Assume that we have a s.r.s. made of 10 bottles from the filling machine of Example 2.6. Find a pair of values \(b_1\) and \(b_2\) such that
\[
\mathbb{P}(b_1\leq S'^2\leq b_2)=0.90.
\]

We know from Theorem 2.2 that \(\frac{(n-1)S'^2}{\sigma^2}\sim\chi_{n-1}^2.\) Therefore, multiplying by \((n-1)\) and dividing by \(\sigma^2\) in the previous probability, we get \[\begin{align*} \mathbb{P}(b_1\leq S'^2\leq b_2)&=\mathbb{P}\left(\frac{(n-1)b_1}{\sigma^2}\leq \frac{(n-1)S'^2}{\sigma^2} \leq \frac{(n-1)b_2}{\sigma^2}\right)\\ &=\mathbb{P}(9b_1\leq \chi_9^2 \leq 9b_2). \end{align*}\] Set \(a_1=9b_1\) and \(a_2=9b_2.\) A possibility is to select:

- \(a_1\) such that the cumulative probability to its left (right) is \(0.05\) (\(0.95\)). This corresponds to the upper \((1-\alpha/2)\)-quantile, \(\chi^2_{\nu,1-\alpha/2},\) with \(\alpha=0.10\) (because \(1-\alpha=0.90\)).
- \(a_2\) such that the cumulative probability to its right is \(0.05.\) This corresponds to the upper \(\alpha/2\)-quantile, \(\chi^2_{\nu,\alpha/2}.\)

Recall that, unlike in the situation of Example 2.7, the p.d.f. of a chi-squared is *not* symmetric, and hence \(\chi^2_{\nu;1-\alpha}\neq -\chi^2_{\nu;\alpha}\) (for the normal we had that \(z_{1-\alpha}= -z_{\alpha}\) and therefore we only cared about \(z_{\alpha}\)).

We can compute \(a_1\) and \(a_2\) by employing the function `qchisq`

:

```
<- 0.10
alpha qchisq(1 - alpha / 2, df = 9, lower.tail = FALSE) # a1
## [1] 3.325113
qchisq(alpha / 2, df = 9, lower.tail = FALSE) # a2
## [1] 16.91898
```

Then, \(a_1\cong3.325\) and \(a_2\cong16.919,\) so the asked values are \(b_1\cong3.325/9=0.369\) and \(b_2\cong 16.919/9=1.88.\)

### 2.2.3 Student’s \(t\) distribution

**Definition 2.4 (Student's $t$ distribution) **Let \(X\sim \mathcal{N}(0,1)\) and \(Y\sim \chi_{\nu}^2\) be independent r.v.’s. The
distribution of the r.v.
\[
T=\frac{X}{\sqrt{Y/\nu}}
\]
is the *Student’s \(t\)* distribution with \(\nu\) degrees of freedom.

The p.d.f. of the Student’s \(t\) distribution is (see the Exercise
2.17)
\[
f_{T}(t)=\frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu
\pi}\,\Gamma\left(\frac{\nu}{2}\right)}\left(1+\frac{t^2}{\nu}\right)^{-(\nu+1)/2},
\ t\in \mathbb{R}.
\]
We can see that the density is symmetric with respect to zero. When \(\nu>1\)^{1}, its expectation is \(\mathbb{E}[T]=0\) and, when \(\nu>2,\) its variance is \(\mathbb{V}\mathrm{ar}[T]=\nu/(\nu-2)>1.\) This means that for \(\nu>2,\) \(T\) has a larger variability than the standard normal. However, the differences between a \(t_{\nu}\) and a \(\mathcal{N}(0,1)\) vanish as \(\nu\to\infty,\) as it can be seen in Figure 2.2.

**Theorem 2.3 (Student's Theorem) **Let \((X_1,\ldots,X_n)\) be a s.r.s. of a \(\mathcal{N}(\mu,\sigma^2)\) r.v. Let \(\bar{X},\) \(S^2,\) and \(S'^2\) the sample mean, variance and quasivariance. Then,
\[
T=\frac{\bar{X}-\mu}{S'/\sqrt{n}}\sim t_{n-1}.
\]
and the statistic \(T\) is referred as the *(Student’s) \(T\) statistic*.

*Proof* (Proof of Theorem 2.3). From Theorem 2.1, we can deduce that
\[\begin{align}
\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\sim \mathcal{N}(0,1).\tag{2.5}
\end{align}\]
On the other hand, by Theorem 2.2 we know that
\[\begin{align}
\frac{(n-1)S'^2}{\sigma^2}\sim \chi_{n-1}^2,\tag{2.6}
\end{align}\]
and that (2.6) is independent of (2.5). Therefore, dividing (2.5) by the square root of (2.6) divided
by its degrees of freedom, we obtain a r.v. with Student’s \(t\) distribution:
\[
T=\frac{\sqrt{n}\, \frac{\bar
X-\mu}{\sigma}}{\sqrt{\frac{(n-1)S'^2}{\sigma^2}/(n-1)}}=\frac{\bar
X-\mu}{S'/\sqrt{n}}\sim t_{n-1}.
\]

**Example 2.10 **The resitance to electric tension of a certain kind of wire is distributed acording to a normal with mean \(\mu\) and variance \(\sigma^2,\) both unknown. Six segments of the wire are selected at random and measured their resistance, being these measurements \(X_1,\ldots,X_6.\) The mean of the population \(\mu\) and the variance \(\sigma^2\) can be estimated by \(\bar{X}\) and \(S'^2.\) Find the approximate probability that the difference between \(\bar{X}\) and \(\mu\) is less than \(2S/\sqrt{n}\) units.

We want to compute the probability
\[\begin{align*}
\mathbb{P}&\left(-\frac{2S'}{\sqrt{n}}\leq \bar{X}-\mu\leq
\frac{2S'}{\sqrt{n}}\right)=\mathbb{P}\left(-2\leq \sqrt{n}\frac{\bar{X}-\mu}{S'}\leq
2\right)\\
&=\mathbb{P}(-2\leq T\leq 2)=1-2\mathbb{P}(T\leq -2).
\end{align*}\]
From Theorem 2.3, we know that \(T\sim t_{5}.\) The probabilities \(\mathbb{P}(t_\nu\leq x)\) can be computed with `pt(x, df = nu)`

:

```
pt(-2, df = 5)
## [1] 0.05096974
```

Therefore, the probability is \(1-2(0.051)=0.898.\)

### 2.2.4 Snedecor’s \(\mathcal{F}\) distribution

**Definition 2.5 (Snedecor’s \(\mathcal{F}\) distribution) **Let \(X_1\) and \(X_2\) be chi-square r.v.’s with \(\nu_1\) and \(\nu_2\) degrees
of freedom, respectively. If \(X_1\) and \(X_2\) are independent, then the r.v.
\[
F=\frac{X_1/\nu_1}{X_2/\nu_2}
\]
is said to have an *Snedecor’s \(\mathcal{F}\)* distribution with \(\nu_1\) and \(\nu_2\) degrees of freedom, which is represented as \(\mathcal{F}_{n_1,n_2}.\)

*Remark*. It can be seen that the \(\mathcal{F}_{1,\nu}\) coincides with \(t_{\nu}^2.\)

**Theorem 2.4 (Sampling distribution of the ratio of quasivariances) **Let \((X_1,\ldots,X_{n_1})\) be a s.r.s. from a \(\mathcal{N}(\mu_1,\sigma_1^2)\) and let \(S_1'^2\) be its sample quasivariance. Let \((Y_1,\ldots,Y_{n_2})\) be another s.r.s., independent from the previous one, from a \(\mathcal{N}(\mu_2,\sigma_2^2)\) and with sample quasivariance \(S_2'^2.\) Then,
\[
F=\frac{S_1'^2/\sigma_1^2}{S_2'^2/\sigma_2^2}\sim
\mathcal{F}_{n_1-1,n_2-1}.
\]

*Proof* (Proof of Theorem 2.4). The proof is straightforward from the independence of both samples, the application of Theorem 2.2 and the definition of Snedecor’s \(\mathcal{F}\) distribution, since
\[
F=\frac{\frac{(n_1-1)S_1'^2}{\sigma_1^2}/(n_1-1)}{
\frac{(n_2-1)S_2'^2}{\sigma_2^2}/(n_2-1)}=\frac{S_1'^2/\sigma_1^2}{S_2'^2/\sigma_2^2}\sim
\mathcal{F}_{n_1-1,n_2-1}.
\]

**Example 2.11 **If we take two independent s.r.s.’s of sizes \(n_1=6\) and \(n_2=10\) from two normal populations with the same (but unknown) variance \(\sigma^2,\) find the number \(b\) such that
\[
\mathbb{P}\left(\frac{S_1'^2}{S_2'^2}\leq b\right)=0.95.
\]

We have that \[ \mathbb{P}\left(\frac{S_1'^2}{S_2'^2}\leq b\right)=0.95\iff \mathbb{P}\left(\frac{S_1'^2}{S_2'^2}> b\right)=0.05. \]

By Theorem 2.4, we know that \[ \frac{S_1'^2/\sigma_1^2}{S_2'^2/\sigma_2^2}=\frac{S_1'^2}{S_2'^2}\sim\mathcal{F}_{5,9}. \]

Therefore, we look for the upper \(\alpha\)-quantile \(\mathcal{F}_{n_1,n_2;\alpha}\) such that \(\mathbb{P}(\mathcal{F}_{n_1,n_2}>\mathcal{F}_{n_1,n_2;\alpha})=\alpha,\)
for \(\alpha=0.05,\) \(n_1=5,\) and \(n_2=9.\) This can be obtained with the function `qf`

, which provides \(b\):

```
qf(0.05, df1 = 5, df2 = 9, lower.tail = FALSE)
## [1] 3.481659
```

## 2.3 The Central Limit Theorem

The Central Limit Theorem (CLT) is a cornerstone result in Statistics, if not *the* cornerstone result. The result states that the sampling distribution of the sample mean \(\bar{X}\) of i.i.d. r.v.’s \(X_1,\ldots,X_n\) converges to a normal distribution as \(n\to\infty.\) The beauty of this result is that it holds *irrespective* of the original distribution of \(X_1,\ldots,X_n.\)

**Theorem 2.5 (Central Limit Theorem) **Let \(X_1,\ldots,X_n\) be i.i.d. r.v.’s with expectation \(\mathbb{E}[X_i]=\mu\) and
variance \(\mathbb{V}\mathrm{ar}[X_i]=\sigma^2<\infty.\) Then, the c.d.f. of the r.v.
\[
U_n=\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}
\]
converges to the c.d.f. of a \(\mathcal{N}(0,1)\) as \(n\to\infty.\)

*Proof* (Proof of Theorem 2.5). We compute the m.g.f. of the r.v. \(U_n.\) Since \(X_1,\ldots,X_n\) are i.i.d.,
\[\begin{align*}
M_{U_n}(s) &=\mathbb{E}\left[ \exp\left\{s\sum_{i=1}^n\left(\frac{X_i -n\mu}{\sigma\sqrt{n}}\right) \right\}\right]
= \prod_{i=1}^n \mathbb{E}\left[ \exp\left\{s\, \frac{X_i -\mu}{\sigma\sqrt{n}}\right\}\right]\\
&=\left(\mathbb{E}\left[\exp\left\{\frac{s}{\sqrt{n}}
Z\right\}\right]\right)^n=\left[M_{Z}\left(\frac{s}{\sqrt{n}}\right)\right]^n,
\end{align*}\]
where \(Z=(X_1-\mu)/\sigma,\) with mean \(\mathbb{E}[Z]=0\) and variance
\(\mathbb{E}[Z^2]=1.\) We perform a Taylor expansion of \(M_{Z}(s/\sqrt{n})\) about \(s=0\):
\[
M_Z\left(\frac{s}{\sqrt{n}}\right)=M_{Z}(0)+M_{Z}^{(1)}(0)\frac{\left(s/\sqrt{n}\right)}{1!}
+M_{Z}^{(2)}(0)\frac{\left(s/\sqrt{n}\right)^2}{2!}+R(s/\sqrt{n}),
\]
where \(M_{Z}^{(k)}(0)\) is the \(k\)-th derivative of the m.g.f. evaluated at \(s=0\) and the remainder term \(R\) is a function that satisfies
\[
\lim_{n\rightarrow \infty} \frac{R\left(s/\sqrt{n}\right)}{(s/\sqrt{n})^2}=0.
\]
Since the derivatives evaluated at zero are equal to the moments, we have:
\[\begin{align*}
M_{Z}\left(\frac{s}{\sqrt{n}}\right) &=\mathbb{E}\left[e^0\right]+\mathbb{E}[Z]\frac{s}{\sqrt{n}}
+\mathbb{E}\left[Z^2\right]\frac{s^2}{2n}+R\left(\frac{s}{\sqrt{n}}\right) \\
&=1+\frac{s^2}{2n}+R\left(\frac{s}{\sqrt{n}}\right) \\
&=1+\frac{1}{n}\left[\frac{s^2}{2}+nR\left(\frac{s}{\sqrt{n}}\right)\right].
\end{align*}\]
Now,
\[\begin{align*}
\lim_{n\rightarrow \infty} M_{U_n}(s)&=\lim_{n\rightarrow \infty}
\left[M_{Z}\left(\frac{s}{\sqrt{n}}\right)\right]^n\\
&=\lim_{n\rightarrow
\infty}\left\{1+\frac{1}{n}\left[\frac{s^2}{2}+nR\left(\frac{s}{\sqrt{n}}\right)\right]\right\}^n.
\end{align*}\]
In order to compute the limit, observe that
\[
\lim_{n\rightarrow \infty}nR\left(\frac{s}{\sqrt{n}}\right)=\lim_{n\rightarrow
\infty} s^2 \frac{R(s/\sqrt{n})}{(s/\sqrt{n})^2}=0,
\]
and hence
\[
\lim_{n\rightarrow
\infty}\left[\frac{s^2}{2}+nR\left(\frac{s}{\sqrt{n}}\right)\right]=\frac{s^2}{2}.
\]
Therefore,
\[
\lim_{n\rightarrow \infty}M_{U_n}(s)=e^{s^2/2}=M_{\mathcal{N}(0,1)}(s).
\]
The application of Theorem 1.3 ends the proof.

*Remark*. The approximation given by the CLT is valid in general for sample sizes larger than \(n=30.\)

*Remark*. If \(X_1,\ldots,X_n\) are normal r.v.’s, then \(U_n\) is *exactly* distributed as a \(\mathcal{N}(0,1).\)

**Example 2.12 **The grades in the admission exams of a given university have a mean of \(\mu=60\) points (over \(100\) points) and a variance of \(\sigma^2=64,\) both known from long-term records. A specific generation of \(n=100\) students had a mean of \(58\) points. Can we state that these students have a significant lower performance? Or perhaps this is a reasonable deviation from the average performance merely due to randomness? In order to answer, compute the probability that the sample mean is at most \(58\) when \(n=100.\)

Let \(\bar{X}\) be the sample mean of the s.r.s. of \(n=100\) students grades. By the CLT we know that \[ Z=\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\cong \mathcal{N}(0,1). \] Then, we are interested in computing the probability \[ \mathbb{P}(\bar{X}\leq 58)=\mathbb{P}\left(Z\leq \frac{58-60}{\sqrt{64/100}}\right)=\mathbb{P}(Z\leq -2.5)\cong 0.0062, \] which has been obtained with

```
pnorm(-2.5)
## [1] 0.006209665
pnorm(58, mean = 60, sd = sqrt(64/100)) # Alternatively
## [1] 0.006209665
```

Observe that this probability is very small. This suggests that it is highly unlikely that this generation of students belongs to the population with mean \(\mu=60.\) It is more likely that its true mean is smaller than \(60.\)

**Example 2.13 **The random waiting time in a cash register of a supermarket is distributed acording to a distribution with mean \(1.5\) minutes and variance \(1.\) What is the approximate probability of serving \(n=100\) clients in less than \(2\) hours?

If \(X_i\) denotes the waiting time of the \(i\)-th client, then we want to compute \[ \mathbb{P}\left(\sum_{i=1}^{100} X_i\leq 120\right)=\mathbb{P}\left(\bar{X}\leq 1.2\right). \]

As the sample size is large, the CLT entails that \(\bar{X}\cong\mathcal{N}(1.5, 1/100),\) so
\[
\mathbb{P}\left(\bar{X}\leq 1.2\right)=\mathbb{P}\left(\frac{\bar Y-1.5}{1/\sqrt{100}} \leq \frac{1.2-1.5}{1/\sqrt{100}}\right)\cong \mathbb{P}(Z\leq -3).
\]
Employing `pnorm`

we can compute \(\mathbb{P}(Z\leq -3)\) rightaway:

```
pnorm(-3)
## [1] 0.001349898
```

The CLT employs a type of convergence for sequences of r.v.’s that we precise next.

**Definition 2.6 (Convergence in distribution) **A sequence of r.v.’s \(X_1,X_2,\ldots\) *converges in distribution* to a r.v. \(X\) if
\[
\lim_{n\rightarrow \infty} F_{X_n}(x)=F_{X}(x),
\]
for all the points \(x\in\mathbb{R}\) where \(F_{X}(x)\) is continuous. The convegence in distribution is denoted by
\[
X_n \stackrel{d}{\longrightarrow} X.
\]

The thesis of the CLT can be rewritten in terms of this notation as follows: \[ U_n=\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\stackrel{d}{\longrightarrow} \mathcal{N}(0,1). \]

The diagram below summarizes the **available sampling distributions for the sample mean**:

\[ \!\!\!\!\!\!\!\!(X_1,\ldots,X_n)\left\{\begin{array}{ll} \sim \mathcal{N}(\mu,\sigma^2) & \implies \bar{X}\sim \mathcal{N}(\mu,\sigma^2/n) \\ \nsim \mathcal{N}(\mu,\sigma^2) & \left\{\begin{array}{ll} n< 30 & \implies \text{?} \\ n\geq 30 & \implies \bar{X}\cong \mathcal{N}(\mu,\sigma^2/n) \end{array}\right. \end{array}\right. \]

**Example 2.14 (Normal approximation to the binomial) **Let \(Y\) be a binomial r.v. of size \(n\in\mathbb{N}\) and success probability \(p\in[0,1]\) (see Example 1.12), and denote this distribution by \(\mathrm{Bin}(n,p).\) It holds that \(Y=\sum_{i=1}^n X_i,\) where \(X_i\sim \mathrm{Ber}(p),\) \(i=1,\ldots,n,\) with \(\mathrm{Ber}(p)=\mathrm{Bin}(1,p)\) and \(X_1,\ldots,X_n\) independent.

It is easy to see that \[ \mathbb{E}[X_i]=p, \quad \mathbb{V}\mathrm{ar}[X_i]=p(1-p). \] Applying the CLT, we have that \[ \bar{X}=\frac{Y}{n}\cong \mathcal{N}\left(p,\frac{p(1-p)}{n}\right), \] which implies that \[\begin{align} \mathrm{Bin}(n,p)\cong \mathcal{N}(np,np(1-p)). \tag{2.7} \end{align}\]

This approximation works well even when \(n\) is not very large (i.e. \(n<30\)), as long as \(p\) is “not very close to zero or one”, which is often translated as requiring that both \(np>5\) and \(np(1-p)>5\) hold.

A *continuity correction* is often employed for improving the accuracy in the computation of binomial probabilities. Then, if \(\tilde{Y}\) denotes the approximating \(\mathcal{N}(np,np(1-p))\) r.v., better approximations are obtained with

\[\begin{align*} \mathbb{P}(Y\leq m) & \cong \mathbb{P}(\tilde{Y}\leq m+0.5), \\ \mathbb{P}(Y\geq m) & \cong \mathbb{P}(\tilde{Y}\geq m-0.5), \\ \mathbb{P}(Y=m) &\cong \mathbb{P}(c-0.5\leq \tilde{Y}\leq m+0.5), \end{align*}\]

where \(m\in\mathbb{N}\) and \(m\leq n.\)

**Example 2.15 **Let \(Y\sim\mathrm{Bin}(25,0.4).\) Using the normal distribution, compute the probability that \(Y\leq 8\) and that \(Y=8.\)

\(\mathbb{P}(Y\leq 8)\) can be approximated as

\[ \mathbb{P}(Y\leq 8)\cong \mathbb{P}(\tilde{Y}\leq 8.5)=\mathbb{P}(\mathcal{N}(10,6)\leq 8.5). \]

and its actual value is

```
pnorm(8.5, mean = 10, sd = sqrt(6))
## [1] 0.2701457
```

The probability of \(Y=8\) is computed as

\[ \mathbb{P}(Y=8) \cong \mathbb{P}(7.5\leq \tilde{Y}\leq 8.5)=\mathbb{P}(7.5\leq \mathcal{N}(10,6)\leq 8.5) \] and its actual value is

```
pnorm(8.5, mean = 10, sd = sqrt(6)) - pnorm(7.5, mean = 10, sd = sqrt(6))
## [1] 0.1164286
```

**Example 2.16 **A candidate for city major belives that he/she may win the elections if he/she obtains at least \(55\%\) of the votes in district D. In addition, he/she assumes that the \(50\%\) of all the voters in the city support him/her. If \(n=100\) voters vote in district D (consider them as a s.r.s. of the voters of the city), what is the probability that the candidate wins at least the \(55\%\) of the votes?

Let \(Y\) be the number of voters in district D that supports the candidate. \(Y\) is distributed as a \(\mathrm{Bin}(100,0.5)\) and we want to know the probability \[ \mathbb{P}(Y/n\geq 0.55)=\mathbb{P}(Y\geq 55), \] when \(p,\) the true probability that a random elector votes for the candidate, equals \(0.5.\)

The probability
\[
\mathbb{P}(\mathrm{Bin}(n, p) \leq m)=1-\mathbb{P}(\mathrm{Bin}(n, p) > m)
\]
can be computed rightaway with `pbinom(m, size = n, prob = p)`

. The desired probability is
\[
\mathbb{P}(\mathrm{Bin}(100, 0.5) \geq 55)=1-\mathbb{P}(\mathrm{Bin}(100, 0.5) < 55),
\]
which can be computed as:

```
1 - pbinom(54, size = 100, prob = 0.5)
## [1] 0.1841008
pbinom(54, size = 100, prob = 0.5, lower.tail = FALSE) # Alternatively
## [1] 0.1841008
```

The previous value was the *exact* probability for a binomial. Let’s see now how close the CLT approximation is. Because of the CLT,
\[
\mathbb{P}(Y/n\geq 0.55)\cong\mathbb{P}\left(\mathcal{N}\left(0.5, \frac{0.25}{100}\right)\geq 0.55\right)
\]
with the actual value given by

```
1 - pnorm(0.55, mean = 0.5, sd = sqrt(0.25 / 100))
## [1] 0.1586553
```

If the continuity correction was employed, the probability is approximated to \[ \mathbb{P}(Y\geq 0.55n)\cong\mathbb{P}\left(\tilde{Y}\geq 0.55n-0.5\right)=\mathbb{P}\left(\mathcal{N}\left(50, 25\right)\geq 54.5\right), \] which takes the value

```
1 - pnorm(54.5, mean = 50, sd = sqrt(25))
## [1] 0.1840601
```

As illustrated, the continuity correction offers a significant improvement, even for \(n=100\) and \(p=0.5.\)

**Example 2.17 (Normal approximation to additive distributions) **The additive property of the binomial is key for obtaining its normal approximation by the CLT seen in Example 2.14. On one hand, if \(X_1,\ldots,X_n\) are \(\mathrm{Bin}(1,p),\) then \(n\bar{X}\) is \(\mathrm{Bin}(n,p).\) On the other hand, the CLT thesis can also be expressed as

\[\begin{align} n\bar{X}\cong\mathcal{N}(n\mu,n\sigma^2) \tag{2.8} \end{align}\]

as \(n\to\infty.\) Equiparing both results we have the approximation (2.7).

Different additive properties are also satisfied by other distributions, as we saw in Exercise 1.16 (Poisson distribution) and Exercise 1.17 (gamma distribution). In addition, the chi-squared distribution, as a particular case of the gamma, has also an additive property (Proposition 2.2) on the degrees of freedom \(\nu.\) Therefore, we can obtain normal asymptotic approximations for these distributions as well.

For the Poisson, we know that if \(X_1,\ldots,X_n\) is a s.r.s. of a \(\mathrm{Pois}(\lambda),\) then

\[\begin{align} n\bar{X}=\sum_{i=1}^nX_i\sim\mathrm{Pois}(n\lambda).\tag{2.9} \end{align}\]

Since the expectation and the variance of a \(\mathrm{Pois}(\lambda)\) are both equal to \(\lambda,\) equaling (2.8) and (2.9) when \(n\to\infty\) yields

\[ \mathrm{Pois}(n\lambda)\cong \mathcal{N}(n\lambda,n\lambda). \]

Note that this approximation is equivalent to saying that when the intensity parameter \(\lambda\) tends to infinity, then \(\mathrm{Pois}(\lambda)\) is approximately a \(\mathcal{N}(\lambda,\lambda).\)

Following similar arguments for the chi-squared distribution, it is easy to see that, when \(\nu\to\infty,\)

\[ \chi^2_\nu\cong \mathcal{N}(\nu,2\nu). \]

## Appendix

The next theorem is a generalization of Fisher’s Theorem and is of key importance in statistical inference.

**Theorem 2.6 **Let \(\mathbf{X}=(X_1,\ldots,X_n)\) be a row vector of independent r.v.’s \(\mathcal{N}(0,\sigma^2).\) We define the linear combinations
\[
Z_i=\mathbf{c}_i \mathbf{X}^\prime, \ i=1,\ldots,p,
\]
where the row vectors \(\mathbf{c}_i\in \mathbb{R}^n\) are orthonormal, that is
\[
\mathbf{c}_i\mathbf{c}_j^\prime=\left\{\begin{array}{ll}
0 & \text{if}\ i\neq j,\\
1 & \text{if} \ i=j.
\end{array}\right.
\]
Then,
\[
Y=\mathbf{X}\mathbf{X}^\prime-\sum_{i=1}^p Z_i^2
\]
is independent from \(Z_1,\ldots,Z_p\) and, in addition,
\[
\frac{Y}{\sigma^2}\sim \chi_{n-p}^2.
\]

*Proof* (Proof of Theorem 2.6). Select \(n-p\) vectors \(\mathbf{c}_{p+1},\ldots,\mathbf{c}_n\) so that \(\{\mathbf{c}_1,\ldots,\mathbf{c}_n\}\) forms an ortonormal basis in \(\mathbb{R}^n.\) Define the \(n\times n\) matrix
\[
\mathbf{C}=\left[\begin{array}{c}
\mathbf{c}_1 \\
\vdots \\
\mathbf{c}_n
\end{array}\right],
\]
that verifies \(\mathbf{C}\mathbf{C}^\prime=\mathbf{I}_n\) or, equivalently, \((\mathbf{C}^\prime \mathbf{C})^\prime=\mathbf{I}_n=\mathbf{I}_n^\prime\) (and as a consequence \(\mathbf{C}^\prime \mathbf{C}=\mathbf{I}_n\)). \(\mathbf{I}_n\) denotes the identity matrix of size \(n.\)

Define the row vector \[ \mathbf{Z}=\mathbf{X}\mathbf{C}^\prime=(\mathbf{X}\mathbf{c}_1^\prime,\ldots, \mathbf{X}\mathbf{c}_p^\prime,\ldots, \mathbf{X}\mathbf{c}_n^\prime)=(Z_1,\ldots, Z_p,\ldots, Z_n). \] Then: \[ \mathbb{E}[\mathbf{Z}]=\mathbf{0}, \ \mathbb{V}\mathrm{ar}[\mathbf{Z}]=\mathbf{C} \mathbb{V}\mathrm{ar}[\mathbf{X}] \mathbf{C}^\prime=\sigma^2 \mathbf{C}\mathbf{C}^\prime =\sigma^2 \mathbf{I}_n. \] Therefore, since \(Z_1,\ldots,Z_n\) are normal (they are linear combinations of normals) and are uncorrelated, they are independent. Besides, solving in \(\mathbf{Z}=\mathbf{X}\mathbf{C}^\prime\) for \(\mathbf{X}\) we have \(\mathbf{X}=\mathbf{Z}\mathbf{C}.\) Considering this and employing that \(\mathbf{C}\mathbf{C}^\prime=\mathbf{I}_n,\) we get \[ \mathbf{X}\mathbf{X}^\prime=\mathbf{Z}\mathbf{C}\mathbf{C}^\prime \mathbf{Z}^\prime=\mathbf{Z}\mathbf{Z}^\prime. \] Replacing this equality in the definition of \(Y,\) it follows that \[ Y=\mathbf{Z}\mathbf{Z}^\prime-\sum_{i=1}^p Z_i^2=\sum_{i=1}^n Z_i^2-\sum_{i=1}^p Z_i^2=\sum_{i=p+1}^n Z_i^2. \] Therefore, \(Y=\sum_{i=p+1}^n Z_i^2\) is independent from \(Z_1,\ldots,Z_p.\) Also, by Corollary 2.1 it follows that \[ \frac{Y}{\sigma^2}=\sum_{i=p+1}^n \frac{Z_i^2}{\sigma^2}\sim \chi_{n-p}^2. \]

## Exercises

**Exercise 2.1 (Sampling distribution of the maximum) **Let \((X_1,\ldots,X_n)\) be a s.r.s. of a r.v. \(X\) with c.d.f. \(F_X.\) Prove that the sampling distribution of the statistic \(T(X_1,\ldots,X_n)=X_{(n)}\) is \(\left[F_X(t)\right]^n.\)

**Solution**. The distribution of \(T\) is the following:
\[\begin{align*}
F_{T}(t) &=\mathbb{P}_{T}(T\leq t)\\
&=\mathbb{P}_{(X_1,\ldots,X_n)}(X_1\leq t,\ldots,X_n\leq t)\\
&=\prod_{i=1}^n \mathbb{P}_{X_i}(X_i\leq t) \\
&=\prod_{i=1}^n F_{X_i}(t)=\left[F_X(t)\right]^n.
\end{align*}\]

**Exercise 2.2 **Let \(X\) be the r.v. that describes the number of days a patient is in an intensive care unit after an operation. It is known that the distribution of \(X\) is

\(r\) | \(1\) | \(2\) | \(3\) |
---|---|---|---|

\(\mathbb{P}(X=r)\) | \(0.3\) | \(0.4\) | \(0.3\) |

Find:

- The mean of the population.
- The standard deviation of the population.
- Let \(X_1\) and \(X_2\) be a s.r.s. of two patients. Find the distribution of the sample mean from the joint distribution of \(X_1\) and \(X_2,\)

**Exercise 2.3 **The monthly savings (in euros) of a student is a normal r.v. with mean \(\mu=35\) and standard deviation \(\sigma=5.\) Sixteen students were selected at random, with \(\bar{X}\) being the sample mean of the measured savings.

- What is the distribution of \(\bar{X}\)?
- Compute the probability that \(\bar{X}\) is larger than \(37.\)
- Compute the probability that \(\bar{X}\) is between \(33.4\) and \(33.6.\)

**Exercise 2.4 **Several government posts believe that a salary increment of the employees in the bank sector follows a normal distribution with standard deviation \(3.37.\) A sample of \(n=16\) employees from the sector is taken. Find the probability that the sample standard deviation is:

- smaller than \(1.99;\)
- larger than \(2.89.\)

**Exercise 2.5 **Assuming that the births of boys and girls are equaly likely, find the probability that in the next \(200\) births:

- less than \(40\%\) of them are boys;
- between \(43\%\) and \(57\%\) are girls;
- more than \(54\%\) are boys.

**Exercise 2.6 **A tobacco manufacturer company assures that the average nicotine content of the tobacco used in their cigarettes is \(0.6\) mg. per cigarettes. An independent organization measures the nicotine content of \(16\) of their cigarettes and finds that the average nicotine content is \(0.75\) and the standard deviation is \(0.175.\) If the nicotine content is assumed to be a normal r.v., what is the probability that, for *any* sample of the same size, an average nicotine equal or larger to the one of the first sample is obtained?

**Exercise 2.7 **The daily expenses in heating of two similar-sized company departments follows a normal r.v. with an average expense of \(10\) euros for both departments, and a standard deviation of \(1\) for the first and \(1.5\) for the second. In order to audit the expenses, the expenses are measured at both departments for \(10\) days chosen at random. Compute:

- The probability that in the \(10\) days, the average expense of the first departament is above the average expense of the second by at least \(10\) euros.
- The probability that the sample variance of the first department is smaller than two times the sample variance of the second.

**Exercise 2.8 **The lifetime of certain electronic components follows a normal distribution with mean \(1600\) hours and standard deviation \(400.\)

- Given a s.r.s. of \(16\) components, find the probability that \(\bar{X}\geq 1500\) hours.
- Given a s.r.s. of \(16\) components, what is the number of hours \(h\) such that the probability that \(\bar{X}\geq h\) is \(0.15.\)
- Given a s.r.s. of \(16\) components, what is the number of hours \(h\) such that the probability that \(S'\geq h\) is \(0.10\) .
- Given a s.r.s. of \(121\) components, find the probability that at least half of the sample components have a lifetime longer than \(1500\) hours.
- Find the number of components for a sample that is required for ensuring that, with probability \(0.92,\) the average lifetime of the sample is larger than \(1500\) hours.

**Exercise 2.9 **Given the s.r.s. of size \(10\) from a normal distribution with standard deviation \(2,\) compute the probability that the sample and the population means differ in more than \(0.5\) units. Compute the size of the sample required for ensuring that, with probability \(0.9,\) the sample and the population means differ in less than \(0.1\) units.

**Exercise 2.10 **The effectivity (measured in days) of a certain drug is distributed as a normal with mean \(14.\) The drug is given to \(16\) patients and the observed quasi-standard deviation in the sample is \(1.4\) days. The minimum average effectivity required for its commercialization is \(13\) days. Determine:

- The probability that the average effectivity does not attain the required minimum.
- The probability that variance is underestimated more than a \(20\%.\)
- Does this probability increase or decrease with the sample size?
- The sample size such that this probability is \(0.05.\)
- A reason of why there is so much concern about variance estimation.

**Exercise 2.11 **The bearing balls of a given manufacturer weight \(0.5\) grams on average and have a standard deviation of \(0.02\) grams. Find the probability that two batches of \(1000\) balls differ in weight more than \(2\) grams.

**Exercise 2.12 **Fifty people have simulated samples of size three from a r.v. \(\mathcal{N}(\mu, \sigma^2).\) Each of the samples gave the value of the statistic
\[
\hat{\mu}=\frac{X_{1}+3X_{2}-X_{3}}{5},
\]
that is going to be used as an estimator of \(\mu.\) The fifty values of \(\hat{\mu}\) are represented in a histogram, and it turns out that the normal distribution that better fits the data has mean \(-1.68\) and
standard deviation \(1.59.\) With this information, is it possible to deduce the distribution \(\mathcal{N}(\mu, \sigma^2)\) from which the data came from?

**Exercise 2.13 **A factory produces certain chemical product, whose amount of impurities has to be controlled. For that aim, \(20\) batches of the product are examined. If the standard deviation of the percentage of impurities is above \(2.5\%,\) then the production chain will have to be carefully examined. It is assumed that the percentage of impurities is normally distributed.

- What is the probability that the production chain will have to be carefully examined, if the population standard deviation is \(2\%\)?
- What is the probability that the average percentage of impurities in the sample is above \(5\%,\) if the average population percentage is \(1\%\)?

**Exercise 2.14 **Let \(X\) and \(Y\) be two independent r.v.’s, distributed as \(\mathrm{Exp}(\lambda).\) What is the distribution of the ratio \(X/Y\)?

**Exercise 2.15 **Let \((X_1,\ldots,X_n)\) be a s.r.s. of a r.v. distributed as \(\mathcal{N}(\mu,\sigma^2).\) Show that:

\[\begin{align*} \mathbb{E}[S^2]&=\frac{n-1}{n}\sigma^2, & \mathbb{V}\mathrm{ar}[S^2]&=\frac{2(n-1)}{n^2}\sigma^4,\\ \mathbb{E}[S'^2]&=\sigma^2, & \mathbb{V}\mathrm{ar}[S'^2]&=\frac{2}{n-1}\sigma^4. \end{align*}\]

**Exercise 2.16 **An environmental protection agency is interested in establishing norms for the amount of permisible chemical products in lakes and rivers. A commonly employed toxity metric is the quantity of any pollutant that will kill half of the test specimens in a given time interval (usually \(96\) hours for fish). This metric is denoted as LC50 (lethal concentration that will kill the \(50\%\) of the test specimens). It has been observed in many studies that \(\log(\mathrm{LC50})\) has a normal distribution. Let \(S_1'^2\) be the sample quasivariance of a s.r.s. of \(10\) values of \(\log(\mathrm{LC50})\) for copper and let \(S_2'^2\) be the sample quasivariance of a s.r.s. of \(8\) values of \(\log(\mathrm{LC50})\) for lead. Both samples were obtained from the same fish species. Assume that the population variance for the copper measurements is twice the one for lead. Assuming that \(S_1'^2\) and \(S_2'^2\) are independent, find two numbers \(a\) and \(b\) such that
\[
\mathbb{P}\left(a\leq \frac{S_1'^2}{S_2'^2}\leq b\right)=0.90.
\]

**Exercise 2.17 **Let \(X\sim \mathcal{N}(0,\sigma^2)\) and \(Y\sim \chi_{\nu}^2\) be two independent r.v.’s. Show that the density of the r.v. defined as
\[
T=\frac{X}{\sqrt{Y/\nu}}
\]
is
\[
f_{T}(t)=\frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu
\pi}\,\Gamma\left(\frac{\nu}{2}\right)}\left(1+\frac{t^2}{\nu}\right)^{-(\nu+1)/2},
\ t\in \mathbb{R}.
\]

**Solution**. We know that \(Y\sim \chi_{\nu}^2,\) so the density of \(Y\) is

\[ f_{Y}(y)=\frac{1}{\Gamma(\nu/2)2^{\nu/2}}\, y^{n/2-1}e^{-y/2}, \ y>0. \]

Also, if \(Z\sim\mathcal{N}(0,1),\) then

\[ T=\frac{X}{\sqrt{Y/\nu}}=\frac{Z}{\sqrt{\sigma^2 Y/\nu}}. \]

Applying a transformation of variables, we obtain the density of \(U=\sigma^2 Y\):

\[\begin{align*} f_{U}(u) &=f_{Y}\left(u/\sigma^2\right)\left|\frac{\partial y}{\partial u}\right|\\ &=\frac{1}{\Gamma(\nu/2) 2^{\nu/2}} \frac{u^{\nu/2-1}}{(\sigma^2)^{\nu/2-1}}\, e^{-u/(2\sigma^2)}\, \frac{1}{\sigma^2} \\ &=\frac{1}{\Gamma(\nu/2) 2^{\nu/2} \sigma^n} \, u^{\nu/2-1} e^{-u/(2\sigma^2)},\ u>0. \end{align*}\]

Performing now the the transformation \(V=\sqrt{U/\nu},\) we obtain

\[\begin{align*} f_{V}(v) &=f_{U}(\nu v^2)\left|\frac{\partial u}{\partial v}\right|\\ &=\frac{1}{\Gamma(\nu/2) 2^{\nu/2} \sigma^{\nu}}\, \nu^{\nu/2-1}v^{\nu-2}\, e^{-\nu v^2/(2\sigma^2)} \, 2\nu v \\ &=\frac{1}{\Gamma(\nu/2) 2^{\nu/2-1} \sigma^{\nu}}\, \nu^{\nu/2}v^{\nu-1}\, e^{-\nu v^2/(2\sigma^2)}. \end{align*}\]

\(X\) and \(V\) are independent, so its joint density is

\[\begin{align*} f_{X,V}(x,v) &=\frac{1}{\sqrt{2\pi \sigma^2}} e^{-x^2/(2\sigma^2)} \cdot \frac{1}{\Gamma(\nu/2) 2^{\nu/2-1} \sigma^{\nu}}\, \nu^{\nu/2}v^{\nu-1}\, e^{-\nu v^2/(2\sigma^2)} \\ &=\frac{1}{\sqrt{2\pi \sigma^2}\Gamma(\nu/2) 2^{\nu/2-1} \sigma^{\nu}}\, \nu^{\nu/2}v^{\nu-1}\, e^{-(x^2+\nu v^2)/(2\sigma^2)}\\ &=Kv^{\nu-1}\, e^{-(x^2+\nu v^2)/(2\sigma^2)} \end{align*}\]

with

\[ K=\frac{\nu^{\nu/2}} {\sqrt{2\pi \sigma^2}\Gamma(\nu/2) 2^{\nu/2-1} \sigma^{\nu}}. \]

We perform next the change of variables

\[ T=X/W,\quad W=V, \]

which is equivalent to

\[ X=WT,\quad V=W. \]

The derivatives of \((X,V)\) with respect to \((T,W)\) are

\[ \frac{\partial X}{\partial T}=W, \quad \frac{\partial X}{\partial W}=T,\quad \frac{\partial V}{\partial T}=0, \quad \frac{\partial V}{\partial W}=1. \]

Then, the Jacobian is

\[ \left|\frac{\partial (X,V)}{\partial (T,W)}\right|=\left|\begin{array}{ll} W & T \\ 0 & 1 \end{array} \right|=W>0. \]

The joint density of \((W,T)\) is therefore

\[\begin{align*} f_{W,T}(w,t) &=f_{X,V}(vt,w)w\\ &=K\, w^{\nu-1} e^{-(v^2t^2+\nu w^2)/2\sigma^2} w \\ &=K\, w^{\nu} e^{-(t^2+\nu)w^2/2\sigma^2}. \end{align*}\]

Integrating with respect to \(w\) and performing the change of variables \(s=w^2,\) we get the marginal density of \(t\):

\[\begin{align*} f_{T}(t) &=K\int_{0}^{\infty} z^{\nu} e^{-(t^2+\nu)w^2/2\sigma^2} \,\mathrm{d}w\\ &=K\int_{0}^{\infty} (z^2)^{\nu/2} e^{-(t^2+\nu)w^2/2\sigma^2} \,\mathrm{d}w\\ &=K\int_{0}^{\infty} v^{\nu/2} e^{-(t^2+\nu)v/2\sigma^2} \frac{1}{2}s^{-1/2} \,\mathrm{d}s\\ &=\frac{K}{2}\int_{0}^{\infty} s^{(\nu-1)/2} e^{-(t^2+\nu)s/2\sigma^2} \,\mathrm{d}s. \end{align*}\]

The previous integrand contains the kernel of the p.d.f. of a \(\Gamma\left((\nu+1)/2,2\sigma^2/(t^2+n)\right)\) distribution. Therefore, the integral has a closed form in terms of the normalizing constant of the gamma distribution, resulting

\[\begin{align*} f_{T}(t) &=\frac{K}{2} \Gamma\left(\frac{\nu+1}{2}\right) \frac{(2\sigma^2)^{(\nu+1)/2}}{(t^2+\nu)^{(\nu+1)/2}}\\ &=\frac{\Gamma\left(\frac{\nu+1}{2}\right)} {\sqrt{\nu \pi}\,\Gamma\left(\frac{\nu}{2}\right)} \frac{(t^2+n)^{-(\nu+1)/2}}{\nu^{-(\nu+1)/2}} \\ &=\frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu \pi}\,\Gamma\left(\frac{\nu}{2}\right)}\left(1+\frac{t^2}{\nu}\right)^{-(\nu+1)/2}, \ t\in \mathbb{R}. \end{align*}\]

When \(\nu=1\) the expectation does

*not*exist! The same happens for the variance when \(\nu=1,2.\)↩︎