1.6 Limit Theorem
1.6.1 Central Limit Theorem
https://en.wikipedia.org/wiki/Galton_board
Theorem 1.2 Let X1,X2,⋯,Xn be a sequence of independent and identically distributed (i.i.d.) random variables with finite mean μ and finite variance σ2.
Let ˉXn=1n∑ni=1Xi be the sample mean, Zn=ˉXn−μσ/√n be the standardized sample mean.
Then, as n approaches infinity, the distribution of Zn converges to a standard normal distribution, i.e.,
lim
where \Phi(z) is the cumulative distribution function of the standard normal distribution.
Special Case:
https://en.wikipedia.org/wiki/De_Moivre%E2%80%93Laplace_theorem
https://en.wikipedia.org/wiki/Illustration_of_the_central_limit_theorem
1.6.2 Probability inequality
1.6.2.1 Markov inequality
Let X be a non-negative random variable. Then for any a>0,
\begin{equation*} P(X \geq a) \leq \frac{E[X]}{a} \end{equation*}
Let I be the indicator random variable for the event \{X \geq a\}, defined as: I = \begin{cases} 1 & \text{if } X \geq a \\ 0 & \text{if } X < a \end{cases} Since X is non-negative, we have X \geq aI for all possible values of X. Taking the expectation of both sides:
E[X] \geq E[aI] = aE[I] = aP(X \geq a) Dividing both sides by a gives the desired result:
P(X \geq a) \leq \frac{E[X]}{a}
1.6.2.2 Chebshev inequality
Let X be a random variable with finite mean \mu and variance \sigma^2. Then for any k>0,
P(|X - \mu| \geq k\sigma) \leq \frac{1}{k^2}
Apply Markov’s inequality to the non-negative random variable (X−\mu)^2:
P((X - \mu)^2 \geq k^2\sigma^2) \leq \frac{E[(X - \mu)^2]}{k^2\sigma^2}
Since E[(X−\mu)^2] = \sigma^2, we have:
P(|X - \mu| \geq k\sigma) \leq \frac{\sigma^2}{k^2\sigma^2} = \frac{1}{k^2}
1.6.2.3 Jensen inequality
Let f be a convex function and X be a random variable. Then:
E[f(X)] \geq f(E[X])
For a convex function f, there exists a supporting hyperplane at any point x_0. This means that for all x:
f(x) \geq f(x_0) + f'(x_0)(x - x_0)
Let x_0 = E[X]. Then:
f(X) \geq f(E[X]) + f'(E[X])(X - E[X])
Taking the expectation of both sides:
\begin{split} E[f(X)] &\geq E[f(E[X])] + E[f'(E[X])(X - E[X])] \\ &= f(E[X]) + f'(E[X])E[X - E[X]] \\ &= f(E[X]) \end{split}
Application:
Take logarithm to Y, then fit OLS: E[\log(Y)] = X \beta
Poisson GLM: \log(E[Y]) = X \beta
1.6.3 Law of large numbers
1.6.3.1 Weak Law of Large Numbers (WLLN)
Theorem 1.3 Let X_1,X_2,\cdots be a sequence of independent and identically distributed (i.i.d.) random variables with finite mean \mu=E[X_i].
Then, for any \varepsilon > 0,
\lim_{n \to \infty} P\left(\left|\frac{1}{n}\sum_{i=1}^{n} X_i - \mu\right| \geq \varepsilon\right) = 0