7 đ Limit Theorems

Figure 3.1: âAreUnormalâ by Enrico Chavez
7.1 Markovâs and Chebyshevâs Inequalities
We will start by overviewing two inequalities that allow the computation of upper bounds for probability statements and play an important role in stablishing the convergence results weâll see further in this chapter.
7.1.1 Markovâs inequality
Example 5.2 (Markovâs Inequality) Q. On the A2 highway (in the Luzern Canton), the speed limit is 80 Km/h. Most drivers are not driving so fast and the average speed on the high way is 70 Km/h. If Z denotes a randomly chosen driverâs speed, what is the probability that such a person is driving faster than the speed limit? \
A. Since we do not have the whole distribution of Z, but we have only limited info (i.e. we know E[Z]=70 Km/h), we have to resort on Markovâs inequality. So using () we obtain an upper bound to the probability:
P(Z \geq 80) \leq \frac{70}{80} = 0.875.Remark. Note that an equivalent expression is given by \begin{equation} \Pr \left( \left\vert Z-\mu_Z\right\vert \geq r\sigma_Z\right) \leq \frac{1 }{r^{2}} \label{Eq. C2} \end{equation}
Put in words, this inequality says that the probability that a random variable lies more than r standard deviations away from its mean value is bounded above by 1/r^2.Proof. Chebyshevâs inequality is, in turn, a special case of Markovâs inequality.
- Chebyshevâs inequality now follows as a direct corollary of Markovâs inequality on taking h(z)=(z-\mu_Z)^2 and \zeta=r^2\sigma_Z^2.
Example 7.1 (Chebyshevâs Inequality) Q. On the A2 highway (in the Luzern Canton), the speed limit is 80 Km/h. Most drivers are not driving so fast and the average speed on the high way is 70 Km/h, . If Z denotes a randomly chosen driverâs speed, what is the probability that such a person is driving faster than the speed limit?
A. Since we do not have the whole distribution of Z, but we have only limited info (i.e. we know E[Z]=70 Km/h AND V(Z)=9 (Km/h)^2), we have to resort on Chebyshevâs inequality and give an upper bound to the probability. Thus,
\begin{eqnarray*} P( Z \geq 80) &=& P( Z - E[Z]\geq 80 - 70) \\ &\leq& P(\vert Z-E[Z] \vert \geq 10) \leq P\left( \frac{\vert Z-E[Z] \vert }{\sqrt{V(Z)}}\geq \frac{10}{\sqrt{9}}\right) \end{eqnarray*}
Using (), with r=\frac{10}{3} and \sigma_Z= 3, we finally get
\begin{eqnarray*} P( Z \geq 80) \leq P\left(\Big\vert Z-E[Z] \Big\vert \geq \left(\frac{10}{3}\right) {3}\right) \leq \frac{1}{\frac{10^2}{3^2}} \leq \frac{9}{100} \leq 0.09 \end{eqnarray*}Remark. Chebychevâs inequality can be rewirtten in a different way.
Indeed, for any random variable Z with mean \mu_Z and variance \sigma_Z^2<\infty \begin{equation} \Pr \left( \left\vert Z-\mu_Z\right\vert \geq \varepsilon \right) \leq \frac{E[Z-\mu_Z]^2}{\varepsilon^{2}} = \frac{\sigma_Z^2}{\varepsilon^{2}}. \label{Eq. C3} \end{equation}
Itâs easy to check that Eq. () coincides with Eq. (), setting in Eq. ()
\varepsilon = r \sigma_Z.
Do the check as an exercise!!7.2 Sequences of Random Variables
Definition 5.2 A sequence of random variables is an ordered list of random variables of the form
\begin{equation*} S_{1},S\,_{2},...,S_{n},... \end{equation*}
where, in an abstract sense, the sequence is infinitely long.We would like to say something about how these random variables behave as n gets larger and larger (i.e. as n tends towards infinity, denoted by n\rightarrow\infty )
The study of such limiting behaviour is commonly called a study of `asymptoticsâ â after the word asymptote used in standard calculus.
7.2.1 Example: Bernoulli Trials and their sum
Let \tilde Z denote a dichotomous random variable with \tilde Z\sim \mathcal{B}(p). A sequence of Bernoulli trials provides us with a sequence of values \tilde Z_{1},\tilde Z_{2},...,\tilde Z_{n},... %where each \tilde {Z}_{i} is such that
\begin{eqnarray*} \Pr("Success")=\Pr \left( \tilde{Z}_{i}=1\right) = p & \text{and} & \Pr("Failure")=\Pr \left( \tilde Z_{i}=0\right) = 1-p \end{eqnarray*}
Now let S_n=\sum_{s=1}^n \tilde Z_s, the number of âSuccessesâ in the first n Bernoulli trials. This yields a new sequence of random variables
\begin{eqnarray*} S_{1} &=& \tilde Z_{1} \\ S_{2} &=&\left( \tilde Z_{1}+ \tilde Z_{2}\right)\\ &&\vdots \\ S_{n} &=&\left( \tilde Z_{1}+ \tilde Z_{2}+\cdots + \tilde Z_{n}\right) = \sum_{i=1}^n \tilde Z_i \end{eqnarray*}
This new sequence is such that S_n\sim B(n,p) for each n.
Now consider the sequence:
{P}_n=S_n/n,
for n=1,2,\ldots, corresponds to the proportion of `Successesâin the first n Bernoulli trials.
It is natural to ask how the behaviour of {P}_n is related to the true probability of a `Successâ (p).
Specifically, the open question at this point is: \
âDo these results imply that {P}_n collapses onto the true p as n increases, and if so, in what way?â \
To gain a clue, let us consider the simulated values of {P}_n.
7.2.2 Example: Bernoulli Trials and limit behaviour
So, informally, we can claim that a sequence of random variables X_{1},X_{2},...,X_{n},... is thought to converge if the probability distribution of X_{n} gets more and more concentrated around a single point as n tends to infinity.
7.3 Convergence in Probability (\overset{p}{\rightarrow })
More formally,
Definition 5.3 A sequence of random variables X_{1},X_{2},...,X_{n},... is said to converge in probability to a number \alpha if for any arbitrary constant \varepsilon >0
\begin{equation*} \lim_{n\rightarrow \infty }\Pr \left( \left\vert X_{n}-\alpha \right\vert >\varepsilon \right) =0 \end{equation*}
If this is the case, we write X_{n}\overset{p}{\rightarrow }\alpha or p\lim X_{n}=\alpha.
A sequence of random variables X_{1},X_{2},...,X_{n},... is said to converge in probability to a random variable X if for any arbitrary constant \varepsilon >0
\begin{equation*} \lim_{n\rightarrow \infty }\Pr \left( \left\vert X_{n}-X \right\vert >\varepsilon \right) =0\,, \end{equation*}
written X_{n}\overset{p}{\rightarrow }X or p\lim(X_{n}-X)=0.7.3.1 Operational Rules for \overset{p}{\rightarrow }
Let us itemize some rules. To this end, let a be any (nonrandom) number so:
If X_{n}\overset{p}{\rightarrow } \alpha then
aX_{n}\overset{p}{\rightarrow }a\alpha and
a+X_{n}\overset{p}{\rightarrow }a+\alpha,
-
If X_{n}\overset{p}{\rightarrow }X then
- aX_{n}\overset{p}{\rightarrow }aX and
- a+X_{n}\overset{p}{\rightarrow }a+X
-
If X_{n}\overset{p}{\rightarrow }\alpha and Y_{n}\overset{p}{\rightarrow }\gamma then
- X_{n}Y_{n}\overset{p}{\rightarrow }\alpha \gamma and
- X_{n}+Y_{n}\overset{p}{\rightarrow }\alpha +\gamma.
-
If X_{n}\overset{p}{\rightarrow }X and Y_{n}\overset{p}{\rightarrow }Y then
-
X_{n}Y_{n}\overset{p}{\rightarrow }X Y and
- X_{n}+Y_{n}\overset{p}{\rightarrow }X +Y
-
X_{n}Y_{n}\overset{p}{\rightarrow }X Y and
Let g\left( x\right) be any (non-random) continuous function. If X_{n}\overset{p}{\rightarrow }\alpha then g\left( X_{n}\right) \overset{p}{\rightarrow }g\left( \alpha \right), and if X_{n}\overset{p}{\rightarrow }X then
g\left( X_{n}\right) \overset{p}{\rightarrow }g\left( X \right).
Suppose X_{1},X_{2},...,X_{n},... is a sequence of random variables with common distribution F_X(x) and moments \mu_r=E [X^r]. At any given point along the sequence, X_{1},X_{2},...,X_{n} constitutes a simple random sample of size n. \
For each fixed sample size n, the rth sample moment is (using an obvious notation) \begin{equation*} M_{(r,n)}=\frac{1}{n}\left( X_{1}^r+X_{2}^r+\cdots +X_{n}^r\right)=\frac{1}{n}\sum_{s=1}^nX_s^r\,, \end{equation*} and we know that E[M_{(r,n)}]=\mu_r\quad\text{and}\quad Var(M_{(r,n)})=\frac{1}{n}(\mu_{2r}-\mu_r^2)\,.
Now consider the sequence of sample moments M_{(r,1)},M_{(r,2)},...,M_{(r,n)},... or, equivalently, \{M_{(r,i)}\}_{i=1}^{n}.
7.3.2 Convergence of Sample Moments as a motivationâŚ
The distribution of M_{(r,n)} (which is unknown because F_X(x) has not been specified) is thus concentrated around \mu_r for all n, with a variance which tends to zero as n increases. \
So the distribution of M_{(r,n)} becomes more and more concentrated around \mu_r as $n $ increases and therefore we might that \begin{equation*} M_{(r,n)}\overset{p}{\rightarrow }\mu_r. \end{equation*}
In fact, this result follows from what is known as the Weak Law of Large Numbers (WLLN).
7.3.3 The Weak Law of Large Numbers (WLLN)
Proposition 7.3 Let X_{1},X_{2},...,X_{n},... be a sequence of random variables with common probability distribution F_X(x), and let Y=h(X) be such that \begin{eqnarray*} E[Y]=E\left[ h(X)\right] &=&\mu_Y \\ Var(Y)=Var\left( h(X)\right) &=&\sigma_Y ^{2}<\infty\,. \end{eqnarray*}% Set \overline{Y}_n=\frac{1}{n}\sum_{s=1}^nY_s\quad\text{where}\quad Y_s=h(X_s)\,,\quad s=1,\ldots,n\,. Then for any two numbers \varepsilon and \delta satisfying \varepsilon>0 and 0<\delta<1
\Pr \left( \left\vert \overline{Y}_{n}-\mu_Y \right\vert<\varepsilon \right)\geq 1-\delta
for all n>\sigma_Y^2/(\varepsilon^2\delta). Choosing both \varepsilon and \delta to be arbitrarily small implies that p\lim_{n\rightarrow\infty}(\overline{Y}_{n}-\mu_Y)=0, or equivalently \overline{Y}_{n}\overset{p}{\rightarrow }\mu_Y.7.3.4 The WLLN and Chebyshevâs Inequality
- First note that E[\overline{Y}_n]=\mu_Y and Var(\overline{Y}_n)=\sigma_Y^2/n.
- Now, according to Chebyshevâs inequality \begin{eqnarray*} \Pr \left( |\overline{Y}_{n}-\mu_Y| <\varepsilon\right) &\geq &1-\frac{E\left[ \left( \overline{Y}_{n}-\mu_Y \right) ^{2}\right] }{\varepsilon^{2}} \\ &=&1-\frac{\sigma_Y ^{2}/n}{\varepsilon^{2}} \\ &=&1-\frac{\sigma_Y ^{2}}{n\varepsilon^{2}}\geq 1-\delta \end{eqnarray*} for all n>\sigma_Y^2/(\varepsilon^2\delta).
- Thus the WLLN is proven, provided we can verify Chebyshevâs inequality.
- Note that by considering the limit as n\rightarrow \infty we also have% \begin{equation*} \lim_{n\rightarrow \infty }\Pr \left( \left\vert \overline{Y}_{n}-\mu_Y\right\vert <\varepsilon\right) \geq \lim_{n\rightarrow \infty }\left( 1-\frac{\sigma^{2}}{n\varepsilon^{2}}\right) =1\,, \end{equation*} again implying that \left( \overline{Y}_{n}-\mu_Y \right) \overset{p}{\rightarrow }0.
If p\lim_{n\rightarrow\infty}(X_n-X)=0 then X_{n}\overset{D}{\rightarrow }X.
Let a be any real number. If X_{n}\overset{D}{\rightarrow }X, then aX_{n}\overset{D}{\rightarrow }aX
If Y_{n}\overset{p}{\rightarrow }\phi and X_{n}\overset{D}{% \rightarrow }X, then
Y_{n}X_{n}\overset{D}{\rightarrow }\phi X, and
Y_{n}+X_{n}\overset{D}{\rightarrow }\phi +X
If X_{n}\overset{D}{\rightarrow }X and g\left( x\right) is any continuous function, then g\left( X_{n}\right) \overset{D}{\rightarrow }% g\left( X\right)
Example 7.2 Suppose X_{1},X_{2},...,X_{n},... is a sequence of independent random variables where X_n\sim B(n,p) with probability of âSuccessâ p.
We already know that, if p=\lambda/n, where \lambda>0 is fixed, then as n goes to infinity, F_{X_{n}}\left( x\right) converges to the probability distribution of a Poisson\left( \lambda \right) random variable. So, X_{n}\overset{D}{\rightarrow }X, where $XPoisson() $
Now consider another case. If p is fixed, the probability distribution of \begin{equation*} Y_{n}=\frac{X_{n}-np}{\sqrt{np\left( 1-p\right) }} \end{equation*} converges, as n goes to infinity, to that of a standard Normal random variable [Theorem of De Moivre-Laplace]. So, Y_ {n}\overset{D}{\rightarrow }Y, where $Y(0,1) $.
Example 5.5
%
Example 3.5 Let us consider a sequence of continuous r.v.âs X_1, X_2, ..., X_n,..., where X_n has range (0, n], for n > 0 and CDF F_{X_n} (x) = 1- \left( 1- \frac{x}{n} \right)^n, \ \ 0<x\leq n. Then, as n \to \infty, the limiting support is (0,\infty), and \forall x >0, we have F_{X_n} (x) \to F_X(x) = 1 - e^{-x} which is the CDF of an exponential r.v. (at all continuity points).
So, we conclude that X_n convergences in distribution to an exponential r.v., that is X_n \overset{D}{\rightarrow } X, \quad X \sim \exp(1).
The following theorem is often said to be one of the most important results. Its significance lies in the fact that it allows accurate probability calculations to be made without knowledge of the underlying distributions!
Theorem 7.1 Let X_{1},X_{2},...,X_{n},... be a sequence of random variables and let Y=h(X) be such that \begin{eqnarray*} E[Y]=E\left[ h(X)\right] &=&\mu_Y \\ Var(Y)=Var\left( h(X)\right) &=&\sigma_Y ^{2}<\infty\,. \end{eqnarray*}% Set \overline{Y}_n=\frac{1}{n}\sum_{s=1}^nY_s\quad\text{where}\quad Y_s=h(X_s)\,,\quad s=1,\ldots,n\,. Then (under quite general regularity conditions)%
\begin{equation*} \frac{\sqrt{n}\left( \overline{Y}_{n}-\mu_Y \right) }{\sigma_Y }\overset{D}{% \rightarrow }N\left( 0,1\right). \end{equation*}
%
%