1.6 Limit Theorem
1.6.1 Central Limit Theorem
https://en.wikipedia.org/wiki/Galton_board
Theorem 1.2 Let \(X_1,X_2,\cdots,X_n\) be a sequence of independent and identically distributed (i.i.d.) random variables with finite mean \(\mu\) and finite variance \(\sigma^2\).
Let \(\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i\) be the sample mean, \(Z_n = \frac{\bar{X}_n−\mu}{\sigma/\sqrt{n}}\) be the standardized sample mean.
Then, as \(n\) approaches infinity, the distribution of \(Z_n\) converges to a standard normal distribution, i.e.,
\[ \lim_{n \to \infty} P(Z_n \leq z) = \Phi(z) \]
where \(\Phi(z)\) is the cumulative distribution function of the standard normal distribution.
Special Case:
https://en.wikipedia.org/wiki/De_Moivre%E2%80%93Laplace_theorem
https://en.wikipedia.org/wiki/Illustration_of_the_central_limit_theorem
1.6.2 Probability inequality
1.6.2.1 Markov inequality
Let \(X\) be a non-negative random variable. Then for any \(a>0\),
\[\begin{equation*} P(X \geq a) \leq \frac{E[X]}{a} \end{equation*}\]
Let \(I\) be the indicator random variable for the event \(\{X \geq a\}\), defined as: \[ I = \begin{cases} 1 & \text{if } X \geq a \\ 0 & \text{if } X < a \end{cases} \] Since \(X\) is non-negative, we have \(X \geq aI\) for all possible values of \(X\). Taking the expectation of both sides:
\[ E[X] \geq E[aI] = aE[I] = aP(X \geq a) \] Dividing both sides by a gives the desired result:
\[ P(X \geq a) \leq \frac{E[X]}{a} \]
1.6.2.2 Chebshev inequality
Let \(X\) be a random variable with finite mean \(\mu\) and variance \(\sigma^2\). Then for any \(k>0\),
\[ P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2} \]
Apply Markov’s inequality to the non-negative random variable \((X−\mu)^2\):
\[ P((X - \mu)^2 \geq k^2\sigma^2) \leq \frac{E[(X - \mu)^2]}{k^2\sigma^2} \]
Since \(E[(X−\mu)^2] = \sigma^2\), we have:
\[ P(|X - \mu| \geq k\sigma) \leq \frac{\sigma^2}{k^2\sigma^2} = \frac{1}{k^2} \]
1.6.2.3 Jensen inequality
Let \(f\) be a convex function and \(X\) be a random variable. Then:
\[ E[f(X)] \geq f(E[X]) \]
For a convex function \(f\), there exists a supporting hyperplane at any point \(x_0\). This means that for all \(x\):
\[ f(x) \geq f(x_0) + f'(x_0)(x - x_0) \]
Let \(x_0 = E[X]\). Then:
\[ f(X) \geq f(E[X]) + f'(E[X])(X - E[X]) \]
Taking the expectation of both sides:
\[ \begin{split} E[f(X)] &\geq E[f(E[X])] + E[f'(E[X])(X - E[X])] \\ &= f(E[X]) + f'(E[X])E[X - E[X]] \\ &= f(E[X]) \end{split} \]
1.6.3 Law of large numbers
1.6.3.1 Weak Law of Large Numbers (WLLN)
Theorem 1.3 Let \(X_1,X_2,\cdots\) be a sequence of independent and identically distributed (i.i.d.) random variables with finite mean \(\mu=E[X_i]\).
Then, for any \(\varepsilon > 0\),
\[ \lim_{n \to \infty} P\left(\left|\frac{1}{n}\sum_{i=1}^{n} X_i - \mu\right| \geq \varepsilon\right) = 0 \]