1.6 Limit Theorem

1.6.1 Central Limit Theorem

https://en.wikipedia.org/wiki/Galton_board

Theorem 1.2 Let $X_1,X_2,\cdots,X_n$ be a sequence of independent and identically distributed (i.i.d.) random variables with finite mean $\mu$ and finite variance $\sigma^2$ .

Let $\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i$ be the sample mean, $Z_n = \frac{\bar{X}_n−\mu}{\sigma/\sqrt{n}}$ be the standardized sample mean.

Then, as $n$ approaches infinity, the distribution of $Z_n$ converges to a standard normal distribution, i.e.,

$\lim_{n \to \infty} P(Z_n \leq z) = \Phi(z)$

where $\Phi(z)$ is the cumulative distribution function of the standard normal distribution.

Special Case:

https://en.wikipedia.org/wiki/De_Moivre%E2%80%93Laplace_theorem

https://en.wikipedia.org/wiki/Illustration_of_the_central_limit_theorem

1.6.2 Probability inequality

1.6.2.1 Markov inequality

Let $X$ be a non-negative random variable. Then for any $a>0$ ,

$\begin{equation*} P(X \geq a) \leq \frac{E[X]}{a} \end{equation*}$

Proof▶

Let $I$ be the indicator random variable for the event $\{X \geq a\}$ , defined as: $I = \begin{cases} 1 & \text{if } X \geq a \\ 0 & \text{if } X < a \end{cases}$ Since $X$ is non-negative, we have $X \geq aI$ for all possible values of $X$ . Taking the expectation of both sides:

$E[X] \geq E[aI] = aE[I] = aP(X \geq a)$ Dividing both sides by a gives the desired result:

$P(X \geq a) \leq \frac{E[X]}{a}$

□

1.6.2.2 Chebshev inequality

Let $X$ be a random variable with finite mean $\mu$ and variance $\sigma^2$ . Then for any $k>0$ ,

$P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}$

Proof▶

Apply Markov’s inequality to the non-negative random variable $(X−\mu)^2$ :

$P((X - \mu)^2 \geq k^2\sigma^2) \leq \frac{E[(X - \mu)^2]}{k^2\sigma^2}$

Since $E[(X−\mu)^2] = \sigma^2$ , we have:

$P(|X - \mu| \geq k\sigma) \leq \frac{\sigma^2}{k^2\sigma^2} = \frac{1}{k^2}$

□

1.6.2.3 Jensen inequality

Let $f$ be a convex function and $X$ be a random variable. Then:

$E[f(X)] \geq f(E[X])$

Proof▶

For a convex function $f$ , there exists a supporting hyperplane at any point $x_0$ . This means that for all $x$ :

$f(x) \geq f(x_0) + f'(x_0)(x - x_0)$

Let $x_0 = E[X]$ . Then:

$f(X) \geq f(E[X]) + f'(E[X])(X - E[X])$

Taking the expectation of both sides:

$\begin{split} E[f(X)] &\geq E[f(E[X])] + E[f'(E[X])(X - E[X])] \\ &= f(E[X]) + f'(E[X])E[X - E[X]] \\ &= f(E[X]) \end{split}$

□

Application:

Take logarithm to $Y$ , then fit OLS: $E[\log(Y)] = X \beta$

Poisson GLM: $\log(E[Y]) = X \beta$

1.6.3 Law of large numbers

1.6.3.1 Weak Law of Large Numbers (WLLN)

Theorem 1.3 Let $X_1,X_2,\cdots$ be a sequence of independent and identically distributed (i.i.d.) random variables with finite mean $\mu=E[X_i]$ .

Then, for any $\varepsilon > 0$ ,

$\lim_{n \to \infty} P\left(\left|\frac{1}{n}\sum_{i=1}^{n} X_i - \mu\right| \geq \varepsilon\right) = 0$

1.6.3.2 Strong Law of Large Numbers (SLLN)

Theorem 1.4 Let $X_1,X_2,\cdots$ be a sequence of i.i.d. random variables with finite mean $\mu=E[X_i]$ .

Then,

$P\left(\lim_{n \to \infty} \frac{1}{n}\sum_{i=1}^{n} X_i = \mu\right) = 1$