Chapter 2 Special Distribution, Order Statistics, Convergence (Lecture on 01/09/2020)

Continue from Chapter 1, there are another two important distributions, namely the Student’s t distribution and F distribution.

The intuition behind Student’s t distribution is we want the variability of \(\bar{X}\) as an estimate of \(\mu\) in case of \(\sigma\) unknown. Suppose \(X_1,\cdots,X_n\) are random sample from \(N(\mu,\sigma^2)\), then from Theorem 1.4, \(\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\sim N(0,1)\), which can be used as a basis of inference. However, if \(\sigma\) is unknown, a natural idea is to consider using \(S\) to substitute it and consider \(\frac{\bar{X}-\mu}{S/\sqrt{n}}\). \[\begin{equation} \frac{\bar{X}-\mu}{S/\sqrt{n}}=\frac{(\bar{X}-\mu)/(\sigma/\sqrt{n})}{\sqrt{S^2/\sigma^2}} \tag{2.1} \end{equation}\] Noticing that the numerator of (2.1) is a \(N(0,1)\) r.v. and the denominator is, by Theorem 1.4, \(\sqrt{\chi_{n-1}^2/(n-1)}\), independent of the numerator. This leads to the Student’s t distribution.

Definition 2.1 (Student’s t Distribution) Let \(X_1,\cdots,X_n\) be a random sample from a \(N(\mu,\sigma^2)\) distribution. Then the quantity \[\begin{equation} \frac{\bar{X}-\mu}{S/\sqrt{n}} \tag{2.2} \end{equation}\] has a t distribution with n-1 degrees of freedom. Equivalently, r.v. T has a t distribution with p degrees of freedom, written as \(T\sim t_p\) if it has pdf \[\begin{equation} f_T(t)=\frac{\Gamma(\frac{p-1}{2})}{\Gamma(\frac{p}{2})}\frac{1}{(p\pi)^{1/2}}\frac{1}{(1+t^2/p)^{(p+1)/2}} \end{equation}\]
The derivation of pdf of t distribution is straight forward. Simply apply transformation \(t=\frac{u}{\sqrt{v/p}}\) and \(\omega=v\) for independent r.v. \(U\sim N(0,1)\) and \(V\sim\chi_p^2\).

The t distribution has no mgf! It dose not have moments of all orders. If there are p degrees of freedom, then there are only p-1 moments. And we have the following property for \(t_p\) distribution.

Lemma 2.1 If \(T_p\sim t_p\), then \[\begin{equation} \begin{split} &ET_p=0,\quad \forall p>1\\ &Var(T_p)=\frac{p}{p-2},\quad \forall p>2 \end{split} \tag{2.3} \end{equation}\] (This is from part (a) of Exercise 5.18 from Casella and Berger (2002))

Proof. For the mean, using the definition of mean we have \[\begin{equation} ET_p=\int_{-\infty}^{+\infty}t\cdot\frac{\Gamma(\frac{p-1}{2})}{\Gamma(\frac{p}{2})}\frac{1}{(p\pi)^{1/2}}\frac{1}{(1+t^2/p)^{(p+1)/2}}dt \tag{2.4} \end{equation}\] Noticing that the integrant of (2.4) is a odd function, therefore, the integral is 0 when \(p>1\).

As for the variance, noticing that \(T_p=\frac{U}{\sqrt{V/p}}\) with independent \(U\sim N(0,1)\) and \(V\sim\chi_p^2\). Thus, \[\begin{equation} Var(T_p^2)=E(T_p^2)=pE(U^2)E(V^{-1})=\frac{p}{p-2},\quad \forall p>2 \tag{2.5} \end{equation}\] where we used the result that the expectation of inverse chi squared distribution with p degrees of freedom is \(\frac{1}{p-2}\).

For F distribution, the intuition behind is to compare the variability of populations of \(N(\mu_1,\sigma_1^2)\) and \(N(\mu_2,\sigma^2_2)\). The quantity of interest would be \(\frac{\sigma_1^2}{\sigma_2^2}\), whose information is contained in \(\frac{S_1^2}{S^2_2}\). The F distribution gives distribution on (2.6) that allows people to compare the two ratio. \[\begin{equation} \frac{S_1^2/S_2^2}{\sigma_1^2/\sigma_2^2}=\frac{S^2_1/\sigma_1^2}{S^2_2/\sigma_2^2} \tag{2.6} \end{equation}\] Noticing that from (2.6), F distribution is the ratio of two independent scaled chi squared random variables.

Definition 2.2 (F Distribution) Let \(X_1,\cdots,X_n\) be a random sample from \(N(\mu_X,\sigma_X^2)\), and let \(Y_1,\cdots,Y_m\) be a random sample from \(N(\mu_Y,\sigma_Y^2)\). Then r.v. \(F=\frac{S_X^2/\sigma_X^2}{S_Y^2/\sigma_Y^2}\) has F distribution with n-1 and m-1 degrees of freedom. Or if \(U\sim\chi^2_n\) and \(V\sim\chi^2_m\), then \(\frac{U/n}{V/m}\sim F(n,m)\). Random variable F with \(p\) and \(q\) degrees of freedom has pdf \[\begin{equation} f_F(x)=\frac{\Gamma(\frac{p+q}{2})}{\Gamma(\frac{p}{2})\Gamma(\frac{q}{2})}(\frac{p}{q})^{p/2}\frac{x^{\frac{p}{2}-1}}{[1+(p/q)x]^{(p+q)/2}},\quad 0<x<\infty \tag{2.7} \end{equation}\]
Example 2.1 (Expectation of F distribution) We compute the expectation of \(F_{n-1,m-1}\) as follows \[\begin{equation} \begin{split} EF_{n-1,m-1}&=E(\frac{\chi^2_{n-1}/(n-1)}{\chi^2_{m-1}/(m-1)})\\ &=E(\frac{\chi^2_{n-1}}{n-1})E(\frac{m-1}{\chi^2_{m-1}})\\ &=(\frac{n-1}{n-1})(\frac{m-1}{m-3})=\frac{m-1}{m-3} \end{split} \tag{2.8} \end{equation}\] Thus, for \(m>3\), we have \(EF_{n-1,m-1}=\frac{m-1}{m-3}\). Also from (2.8), removing expectation and for reasonablely large m, \(\frac{S_X^2/S_Y^2}{\sigma_X^2/\sigma_Y^2}\approx\frac{m-1}{m-3}\approx 1\) as expected.

Theorem 2.1 (Properties of F Distribution) a. If \(X\sim F_{p,q}\), then \(1/X\sim F_{q,p}\).

  1. If \(X\sim t_q\), then \(X^2\sim F_{1,q}\).

  2. If \(X\sim F_{p,q}\), then \(\frac{(p/q)X}{1+(p/q)X}\sim Beta(p/2,q/2)\)

(This is from Exercise 5.17 and Exercise 5.18 on Casella and Berger (2002))

Proof. a. By definition, \(X=\frac{U/p}{V/q}\) with independent \(U\sim\chi^2_p\) and \(V\sim\chi^2_q\). Therefore, \(1/X=\frac{V/p}{U/q}\) follows \(F_{q,p}\) by definition.

  1. By definition, \(X=\frac{U}{\sqrt{V/q}}\) with independent \(U\sim N(0,1)\) and \(V\sim\chi_q^2\). Therefore, \(X^2=\frac{U^2/1}{V/q}\) follows \(F_{1,q}\) by definition.

  2. It can be done by using variable transformation.
Definition 2.3 (Order Statistics) The order statistics of a random sample \(X_1,\cdots,X_n\) are the sample values placed in ascending order, which is denoted by \(X_{(1)},\cdots,X_{(n)}\). It follows that \(X_{(1)}\leq\cdots\leq X_{(n)}\).
There are some commonly used examples of order statistics, such as sample range \(R:=X_{(n)}-X_{(1)}\), sample median defined by \[\begin{equation} M=\left\{ \begin{aligned} &X_{(n+1)/2} & n\,odd \\ &(X_{n/2}+X_{n/2+1})/2 & n\,even \end{aligned} \right. \tag{2.9} \end{equation}\] Sample median is more robust than sample mean. For \(0\leq p\leq1\), (100p)th sample percentile is the observation such that approximately \(np\) of the observations are less than this observation. Lower quartile is the 25th percentile and upper quartile is the 75th percentile. Their difference is termed interquartile range which is also a measure of dispersion.
Theorem 2.2 (PDF of order statistics) Let \(X_1,\cdots,X_n\) be a random sample, with discrete distribution \(Pr(X=x_i)=p_i\). Define \(P_0=0,P_i=\sum_{j=1}^ip_j\). Then \[\begin{align} &Pr(X_{(j)}\leq x_i)=\sum_{k=j}^n{n \choose k}P_i^k(1-P_i)^{n-k} \tag{2.10}\\ &P(X_{(j)}=x_i)=\sum_{k=j}^n{n \choose k}[P_i^k(1-P_i)^{n-k}-P_{i-1}^k(1-P_{i-1})^{n-k}] \tag{2.11} \end{align}\] If X has continuous cdf \(F_X\) and pdf \(f_X\), then \[\begin{equation} f_{X_{(i)}}(x)=\frac{n!}{(j-1)!(n-j)!}f_{X}(x)[F_X(x)]^{j-1}[1-F_X(x)]^{n-j} \tag{2.12} \end{equation}\]

Proof. For fixed i, let \(Y\) be a random variable counts the number of \(X_1,\cdots,X_n\) that are less than or equal to \(x_i\). Then it follows that \(Y\sim Bin(n,P_i)\). The event \(\{X_{(j)}\leq x_i\}\) is equivalent to \(\{Y\geq j\}\). (2.10) is just the binominal probability of \(P(Y\geq j)=P(X_{j}\leq x_i)\). Equation (2.11) is just difference \[\begin{equation} P(X_{(j)}=x_i)=P(X_{(j)}\leq x_i)-P(X_{(j)}\leq x_{i-1}) \tag{2.13} \end{equation}\] with exception for the case \(i=1\) where \(P(X_{(j)}=x_1)=P(X_{(j)}\leq x_1)\).

For continuous case, \(Y\sim Bin(n,F_X(x))\). Thus \[\begin{equation} F_{X_{(j)}}(x)=P(Y\geq j)=\sum_{k=j}^n{n \choose k}[F_X(x)]^k[1-F_X(x)]^{n-k} \tag{2.14} \end{equation}\] and the pdf of \(X_{(j)}\) is get by differentiate cdf \[\begin{equation} \begin{split} f_{X_{(j)}}(x)&=\frac{d}{dx}F_{X_{(j)}}(x)\\ &=\sum_{k=j}^n{n \choose k}(k[F_X(x)]^{k-1}[1-F_X(x)]^{n-k}f_X(x)\\ &-(n-k)[F_X(x)]^k[1-F_X(x)]^{n-k-1}f_X(x))\\ &={n \choose j}j[F_X(x)]^{j-1}[1-F_X(x)]^{n-j}f_X(x)\\ &+\sum_{k=j+1}^n{n \choose k}k[F_X(x)]^{k-1}[1-F_X(x)]^{n-k}f_X(x)\\ &-\sum_{k=j}^{n-1}{n \choose k}(n-k)[F_X(x)]^k[1-F_X(x)]^{n-k-1}f_X(x))\\ &=\frac{n!}{(j-1)!(n-j)!}f_X(x)[F_X(x)]^{j-1}[1-F_X(x)]^{n-j}\\ &+\sum_{k=j}^{n-1}{n \choose {k+1}}(k+1)[F_X(x)]^k[1-F_X(x)]^{n-k-1}f_X(x))\\ &-\sum_{k=j}^{n-1}{n \choose k}(n-k)[F_X(x)]^k[1-F_X(x)]^{n-k-1}f_X(x)) \end{split} \tag{2.15} \end{equation}\] Noting that \[\begin{equation} {n \choose {k+1}}(k+1)=\frac{n!}{k!(n-k-1)!}={n \choose k}(n-k) \tag{2.16} \end{equation}\] Thus, the last two term of (2.15) cancel out and it lefts with (2.12).
Definition 2.4 (Converge in Probability) A sequence of random variables \(X_1,X_2,\cdots\) converges in probability to a random variable X if, for every \(\epsilon>0\),\(\lim_{n\to\infty}P(|X_n-X|\geq\epsilon)=0\) or equivalently \(\lim_{n\to\infty}P(|X_n-X|<\epsilon)=1\).
Note that this definition dose not require independency! and converges in probability is also referred as weak convergence.
Theorem 2.3 (Weak Law of Large Numbers) Let \(X_1,X_2,\cdots\) be i.i.d. random variables with \(EX_i=\mu\) and \(Var(X_i)=\sigma^2<\infty\). Define \(\bar{X}_n=\frac{1}{n}\sum_{i=1}^nX_i\). Then \(\bar{X}_n\) converges in probability to \(\mu\), i.e. \[\begin{equation} \lim_{n\to\infty}P(|\bar{X}_n-\mu|<\epsilon)=1,\quad \forall \epsilon>0 \tag{2.17} \end{equation}\]
Proof. By Chebychev inequatility \[\begin{equation} P(|\bar{X}_n-\mu|\geq\epsilon)=P((\bar{X}_n-\mu)^2\geq\epsilon^2)\leq\frac{E(\bar{X}_n-\mu)^2}{\epsilon^2}=\frac{Var(\bar{X}_n)}{\epsilon^2}=\frac{\sigma^2}{n\epsilon^2} \tag{2.18} \end{equation}\] Hence, \(P(|\bar{X}_n-\mu|<\epsilon)=1-P(|\bar{X}_n-\mu|\geq\epsilon)\geq1-\frac{\sigma^2}{n\epsilon^2}\to1\) as \(n\to\infty\)
A sequence of the “same” sample quantity approaches a constant as \(n\to\infty\), is known as consistency.

Theorem 2.4 Suppose \(X_1,X_2,\cdots\) converges in probability to a random variable \(X\) and \(h\) is a continuous function. Then \(h(X_1),h(X_2),\cdots\) converges in probability to \(h(X)\).

(This is from Exercise 5.39 in Casella and Berger (2002))
Proof. Since h is a continuous function, for any \(\epsilon>0\), there exist some \(\delta>0\) such that as long as \(|x_n-x|<\delta\), \(|h(x_n)-h(x)|<\epsilon\). Since \(X_1,X_2,\cdots\) converges in probability to \(X\), . Thus, for any \(\epsilon>0\), there exist \(\delta>0\) such that \(1\geq P(|h(X_n)-h(x)|<\epsilon)\geq P(|X_n-x|<\delta)\to1\) as \(n\to\infty\). Thus, \(P(|h(X_n)-h(x)|<\epsilon)\to1\) as we desired.
Definition 2.5 (Almost Surely Convergence) A sequence of random variables \(X_1,X_2,\cdots\) converges almost surely to a r.v. \(X\) if for every \(\epsilon>0\), \[\begin{equation} P(\lim_{n\to\infty}|X_n-X|<\epsilon)=1 \tag{2.19} \end{equation}\]
Almost surely convergence is sometimes also referred as convergence with probability 1 or strong convergence. Strong in the sense that compared with converge in probability. It means that a sequence of random variables can converge in probability while NOT converge almost surely. One example is that suppose sample space \(S\) be the closed interval \([0,1]\) with uniform probability distribution. Define \(X_1(s)=s+I_{[0,1]}(s)\),\(X_2(s)=s+I_{[0,\frac{1}{2}]}(s)\),\(X_3(s)=s+I_{[\frac{1}{2},1]}(s)\),\(X_4(s)=s+I_{[0,\frac{1}{3}]}(s)\),\(X_5(s)=s+I_{[\frac{1}{3},\frac{2}{3}]}(s)\),\(X_6(s)=s+I_{[\frac{2}{3},1]}(s),\cdots\) etc. Let \(X(s)=s\). Then \(X_n\) converges to X in probability but not almost surely.
Theorem 2.5 (Strong Law of Large Numbers) Let \(X_1,X_2,\cdots\) be i.i.d. random variables with \(EX_i=\mu\) and \(Var(X_i)=\sigma^2<\infty\). Define \(\bar{X}_n=\frac{1}{n}\sum_{i=1}^nX_i\). Then \(\bar{X}_n\) converges to \(\mu\) almost surely, i.e. \[\begin{equation} P(\lim_{n\to\infty}|\bar{X}_n-\mu|<\epsilon)=1,\quad \forall \epsilon>0 \tag{2.20} \end{equation}\]
Both in Weak and Strong Law of Large Numbers we had a finite variance assumption, which is not actually required. The only moment condition needed is \(E|X_i|<\infty\). See theoretical probability books for detail.
Definition 2.6 (Converge in Distribution) A sequence of random variables \(X_1,X_2,\cdots\) converges in distribution to a r.v. \(X\) if \[\begin{equation} \lim_{n\to\infty}F_{X_n}(x)=F_X(x) \tag{2.21} \end{equation}\] at all points \(x\) where \(F_X(x)\) is continuous.
Converge in distribution is actually convergence of cdfs. It is fundamentally different with converge in probability and converge almost surely, which are about converge of random variables.

Theorem 2.6 If the sequence of random variables \(X_1,X_2,\cdots\) converges in probability to a random variable \(X\), the sequence also converges in distribution to \(X\).

(This is Exercise 5.40 on Casella and Berger (2002))

Proof. Firstly we need to prove the following lemma. For any random variable \(X,Y\) on sample space \(S\), let a be a real number and for any \(\epsilon>0\), we have \(P(Y\leq a)\leq P(X\leq a+\epsilon)+P(|Y-X|>\epsilon)\). The proof is denote \(S_1:=\{s\in S:Y(s)\leq a\}\), \(S_2:=\{s\in S: |Y(s)-X(s)|<\epsilon\}\) and \(S_3=:\{s\in S: X(s)\leq a+\epsilon\}\) then since \(Y\leq a\) and \(|Y-X|<\epsilon\) implies \(X\leq a+\epsilon\), we have \(S_1\subset S_2^c\cup S_3\) and thus \(P(Y\leq a)\leq P(X\leq a+\epsilon)+P(|Y-X|>\epsilon)\). The lemma is proved.

Then for any fixed \(t\) at which the cdf is continuous and \(\epsilon>0\), it follows from the lemma that \[\begin{align} &P(X \leq t-\epsilon)\leq P(X_n\leq t)+P(|X_n-X|>\epsilon) \tag{2.22}\\ &P(X_n \leq t)\leq P(X\leq t+\epsilon)+P(|X_n-X|>\epsilon) \tag{2.23} \end{align}\] Therefore, \(P(X \leq t-\epsilon)-P(|X_n-X|>\epsilon)\leq P(X_n \leq t)\leq P(X\leq t+\epsilon)+P(|X_n-X|>\epsilon)\). Let \(n\to\infty\) we have \(P(X \leq t-\epsilon)\leq \lim_{n\to\infty}F_{X_n}(t)\leq P(X\leq t+\epsilon)\) holds for any \(\epsilon\). Since by assumption \(F_X(x)\) continuous at \(t\), we finally have \(\lim_{n\to\infty}F_{X_n}(x)=F_X(x)\) as we desired.

One special case where the inverse of Theorem 2.6 is stated below.

Theorem 2.7 The sequence of random variables \(X_1,X_2,\cdots\) converges in probability to a constant \(\mu\) iff the sequence also converges in distribution to \(\mu\). That is \[\begin{equation} P(|X_n-\mu|>\epsilon)\to 0\quad\forall\epsilon>0\iff P(X_n\leq x)\to\left\{ \begin{aligned} &0 & if \, x<\mu \\ &1 & if \, x\geq\mu \end{aligned} \right. \tag{2.24} \end{equation}\]

(This is Exercise 5.41 on Casella and Berger (2002))

Proof. \((\Longrightarrow)\) Set \(\epsilon=|x-\mu|>0\). If \(x>\mu\), then the set \(S_1:=\{s\in S: |X_n(s)-\mu|\leq\epsilon\}\) is contained in the set \(S_2:=\{s\in S: X_n(s)\leq x\}\). Therefore, \(1\geq P(X_n\leq x)\geq P(|X_n-\mu|\leq\epsilon)\to1\) as \(n\to\infty\). On the other hand, if \(x<\mu\), then the set of \(S_1^*:=\{s\in S: |X_n(s)-\mu|\geq\epsilon\}\) contains the set \(S_2\), which indicates \(0\leq P(X_n\leq x)\leq P(|X_n-\mu|\geq\epsilon)\to0\) as \(n\to\infty\). Hence, we have proved \(\Longrightarrow\) part.

\((\Longleftarrow)\) For any \(\epsilon>0\), it follows that \[\begin{equation} \begin{split} 0&\leq P(|X_n-\mu|>\epsilon)\\ &\leq P(X_n-\mu<-\epsilon)+P(X_n-\mu>\epsilon)\\ &=P(X_n<\mu-\epsilon)+P(X_n>\mu+\epsilon)\\ &=P(X_n<\mu-\epsilon)+1-P(X_n\leq\mu+\epsilon) \end{split} \tag{2.25} \end{equation}\] Since \(\epsilon>0\), we have as \(n\to\infty\), \(P(|X_n-\mu|>\epsilon)\to 0\) as we desired.
Theorem 2.8 (Central Limit Theorem) Let \(X_1,X_2,\cdots\) be a sequence of i.i.d. random variables whose mgfs exist in a neighborhood of 0. Let \(EX_i=\mu\) and \(Var(X_i)=\sigma^2>0\). (Both of them are finite since mgf exists.) Let \(G_n(x)\) denote the cdf of \(\frac{\sqrt{n}(\bar{X}_n-\mu)}{\sigma}\), then for any \(-\infty<x<\infty\), \[\begin{equation} \lim_{n\to\infty}G_n(x)=\int_{-\infty}^x\frac{1}{\sqrt{2\pi}}e^{-\frac{y^2}{2}}dy \tag{2.26} \end{equation}\] that is, \(\frac{\sqrt{n}(\bar{X}_n-\mu)}{\sigma}\) converge in distribution to standard normal random variable.

Proof. Show by mgf, that is for \(|t|<h\), the mgf of \(\sqrt{n}(\bar{X}_n-\mu)/\sigma\) converges to \(e^{t^2/2}\), the mgf of standard normal random variable

Define \(Y_i=\frac{X_i-\mu}{\sigma}\) and let \(M_Y(t)\) denote the common mgf of \(Y_i\)s. Since \[\begin{equation} \frac{\sqrt{n}(\bar{X}_n-\mu)}{\sigma}=\frac{1}{\sqrt{n}}\sum_{i=1}^nY_i \tag{2.27} \end{equation}\] From the properties of mgfs, \[\begin{equation} \begin{split} M_{\sqrt{n}(\bar{X}_n-\mu)/\sigma}(t)&=M_{\sum_{i=1}^nY_i/\sqrt{n}}(t)\\ &=M_{\sum_{i=1}^nY_i}(\frac{t}{\sqrt{n}})\\ &=[M_Y(\frac{t}{\sqrt{n}})]^n \end{split} \tag{2.28} \end{equation}\] We now expand \(M_Y(t/\sqrt{n})\) in a Taylor series around 0. We have \[\begin{equation} M_Y(t/\sqrt{n})=\sum_{k=0}^{\infty}M_Y^{(k)}(0)\frac{(t/\sqrt{n})^k}{k!} \tag{2.29} \end{equation}\] where \(M_Y^{(k)}(0)=(d^k/dt^k)M_Y(t)|_{t=0}\). Using the fact that \(M_Y^{(0)}=1,EY=M_Y^{(1)}=0\) and \(Var(Y)=M_Y^{(2)}=1\), we have \[\begin{equation} M_Y(\frac{t}{\sqrt{n}})=1+\frac{(t/\sqrt{n})^2}{2!}+R_Y(\frac{t}{\sqrt{n}}) \tag{2.30} \end{equation}\] For fixed \(t\neq0\), \(R_Y(\frac{t}{\sqrt{n}})\) contains \(\frac{t}{\sqrt{n}}\) terms with order higher than 2, so \[\begin{equation} \lim_{n\to\infty}\frac{R_Y(\frac{t}{\sqrt{n}})}{(\frac{t}{\sqrt{n}})^2}=0 \tag{2.31} \end{equation}\] Since t is fixed, we also have \[\begin{equation} \lim_{n\to\infty}\frac{R_Y(\frac{t}{\sqrt{n}})}{(\frac{1}{\sqrt{n}})^2}=\lim_{n\to\infty}nR_Y(\frac{t}{\sqrt{n}})=0 \tag{2.32} \end{equation}\] which is also true at \(t=0\). Thus, for any fixed \(t\) we have \[\begin{equation} \begin{split} \lim_{n\to\infty}(M_Y(\frac{t}{\sqrt{n}}))^n&=\lim_{n\to\infty}[1+\frac{(t/\sqrt{n})^2}{2!}+R_Y(\frac{t}{\sqrt{n}})]^n\\ &=\lim_{n\to\infty}[1+\frac{1}{n}(\frac{t^2}{2}+nR_Y(\frac{t}{\sqrt{n}}))]^n &=e^{t^2/2} \end{split} \tag{2.33} \end{equation}\] as we desired.

CLT describes the limiting distribution of sample mean. It can be shown that the only two required constrains are independence and finite variance, and it ends up with normality. CLT shows that we can use normal to approximate other distribution while the power of the approximation is case by case different.

We conclude this chapter by a useful theorem without proof.

Theorem 2.9 (Slutsky Theorem) If \(X_n\to X\) in distribution and \(Y_n\to a\), a is a constant in probability, then

  1. \(Y_nX_n\to aX\) in distribution;

  2. \(X_n+Y_n\to X+a\) in distribution.

References

Casella, George, and Roger Berger. 2002. Statistical Inference. 2nd ed. Belmont, CA: Duxbury Resource Center.