Chapter 2 Special Distribution, Order Statistics, Convergence (Lecture on 01/09/2020)
Continue from Chapter 1, there are another two important distributions, namely the Student’s t distribution and F distribution.
The intuition behind Student’s t distribution is we want the variability of \(\bar{X}\) as an estimate of \(\mu\) in case of \(\sigma\) unknown. Suppose \(X_1,\cdots,X_n\) are random sample from \(N(\mu,\sigma^2)\), then from Theorem 1.4, \(\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}\sim N(0,1)\), which can be used as a basis of inference. However, if \(\sigma\) is unknown, a natural idea is to consider using \(S\) to substitute it and consider \(\frac{\bar{X}-\mu}{S/\sqrt{n}}\). \[\begin{equation} \frac{\bar{X}-\mu}{S/\sqrt{n}}=\frac{(\bar{X}-\mu)/(\sigma/\sqrt{n})}{\sqrt{S^2/\sigma^2}} \tag{2.1} \end{equation}\] Noticing that the numerator of (2.1) is a \(N(0,1)\) r.v. and the denominator is, by Theorem 1.4, \(\sqrt{\chi_{n-1}^2/(n-1)}\), independent of the numerator. This leads to the Student’s t distribution.
The t distribution has no mgf! It dose not have moments of all orders. If there are p degrees of freedom, then there are only p-1 moments. And we have the following property for \(t_p\) distribution.
Proof. For the mean, using the definition of mean we have \[\begin{equation} ET_p=\int_{-\infty}^{+\infty}t\cdot\frac{\Gamma(\frac{p-1}{2})}{\Gamma(\frac{p}{2})}\frac{1}{(p\pi)^{1/2}}\frac{1}{(1+t^2/p)^{(p+1)/2}}dt \tag{2.4} \end{equation}\] Noticing that the integrant of (2.4) is a odd function, therefore, the integral is 0 when \(p>1\).
As for the variance, noticing that \(T_p=\frac{U}{\sqrt{V/p}}\) with independent \(U\sim N(0,1)\) and \(V\sim\chi_p^2\). Thus, \[\begin{equation} Var(T_p^2)=E(T_p^2)=pE(U^2)E(V^{-1})=\frac{p}{p-2},\quad \forall p>2 \tag{2.5} \end{equation}\] where we used the result that the expectation of inverse chi squared distribution with p degrees of freedom is \(\frac{1}{p-2}\).For F distribution, the intuition behind is to compare the variability of populations of \(N(\mu_1,\sigma_1^2)\) and \(N(\mu_2,\sigma^2_2)\). The quantity of interest would be \(\frac{\sigma_1^2}{\sigma_2^2}\), whose information is contained in \(\frac{S_1^2}{S^2_2}\). The F distribution gives distribution on (2.6) that allows people to compare the two ratio. \[\begin{equation} \frac{S_1^2/S_2^2}{\sigma_1^2/\sigma_2^2}=\frac{S^2_1/\sigma_1^2}{S^2_2/\sigma_2^2} \tag{2.6} \end{equation}\] Noticing that from (2.6), F distribution is the ratio of two independent scaled chi squared random variables.
Theorem 2.1 (Properties of F Distribution) a. If \(X\sim F_{p,q}\), then \(1/X\sim F_{q,p}\).
If \(X\sim t_q\), then \(X^2\sim F_{1,q}\).
If \(X\sim F_{p,q}\), then \(\frac{(p/q)X}{1+(p/q)X}\sim Beta(p/2,q/2)\)
Proof. a. By definition, \(X=\frac{U/p}{V/q}\) with independent \(U\sim\chi^2_p\) and \(V\sim\chi^2_q\). Therefore, \(1/X=\frac{V/p}{U/q}\) follows \(F_{q,p}\) by definition.
By definition, \(X=\frac{U}{\sqrt{V/q}}\) with independent \(U\sim N(0,1)\) and \(V\sim\chi_q^2\). Therefore, \(X^2=\frac{U^2/1}{V/q}\) follows \(F_{1,q}\) by definition.
- It can be done by using variable transformation.
Proof. For fixed i, let \(Y\) be a random variable counts the number of \(X_1,\cdots,X_n\) that are less than or equal to \(x_i\). Then it follows that \(Y\sim Bin(n,P_i)\). The event \(\{X_{(j)}\leq x_i\}\) is equivalent to \(\{Y\geq j\}\). (2.10) is just the binominal probability of \(P(Y\geq j)=P(X_{j}\leq x_i)\). Equation (2.11) is just difference \[\begin{equation} P(X_{(j)}=x_i)=P(X_{(j)}\leq x_i)-P(X_{(j)}\leq x_{i-1}) \tag{2.13} \end{equation}\] with exception for the case \(i=1\) where \(P(X_{(j)}=x_1)=P(X_{(j)}\leq x_1)\).
For continuous case, \(Y\sim Bin(n,F_X(x))\). Thus \[\begin{equation} F_{X_{(j)}}(x)=P(Y\geq j)=\sum_{k=j}^n{n \choose k}[F_X(x)]^k[1-F_X(x)]^{n-k} \tag{2.14} \end{equation}\] and the pdf of \(X_{(j)}\) is get by differentiate cdf \[\begin{equation} \begin{split} f_{X_{(j)}}(x)&=\frac{d}{dx}F_{X_{(j)}}(x)\\ &=\sum_{k=j}^n{n \choose k}(k[F_X(x)]^{k-1}[1-F_X(x)]^{n-k}f_X(x)\\ &-(n-k)[F_X(x)]^k[1-F_X(x)]^{n-k-1}f_X(x))\\ &={n \choose j}j[F_X(x)]^{j-1}[1-F_X(x)]^{n-j}f_X(x)\\ &+\sum_{k=j+1}^n{n \choose k}k[F_X(x)]^{k-1}[1-F_X(x)]^{n-k}f_X(x)\\ &-\sum_{k=j}^{n-1}{n \choose k}(n-k)[F_X(x)]^k[1-F_X(x)]^{n-k-1}f_X(x))\\ &=\frac{n!}{(j-1)!(n-j)!}f_X(x)[F_X(x)]^{j-1}[1-F_X(x)]^{n-j}\\ &+\sum_{k=j}^{n-1}{n \choose {k+1}}(k+1)[F_X(x)]^k[1-F_X(x)]^{n-k-1}f_X(x))\\ &-\sum_{k=j}^{n-1}{n \choose k}(n-k)[F_X(x)]^k[1-F_X(x)]^{n-k-1}f_X(x)) \end{split} \tag{2.15} \end{equation}\] Noting that \[\begin{equation} {n \choose {k+1}}(k+1)=\frac{n!}{k!(n-k-1)!}={n \choose k}(n-k) \tag{2.16} \end{equation}\] Thus, the last two term of (2.15) cancel out and it lefts with (2.12).Theorem 2.4 Suppose \(X_1,X_2,\cdots\) converges in probability to a random variable \(X\) and \(h\) is a continuous function. Then \(h(X_1),h(X_2),\cdots\) converges in probability to \(h(X)\).
(This is from Exercise 5.39 in Casella and Berger (2002))Theorem 2.6 If the sequence of random variables \(X_1,X_2,\cdots\) converges in probability to a random variable \(X\), the sequence also converges in distribution to \(X\).
(This is Exercise 5.40 on Casella and Berger (2002))Proof. Firstly we need to prove the following lemma. For any random variable \(X,Y\) on sample space \(S\), let a be a real number and for any \(\epsilon>0\), we have \(P(Y\leq a)\leq P(X\leq a+\epsilon)+P(|Y-X|>\epsilon)\). The proof is denote \(S_1:=\{s\in S:Y(s)\leq a\}\), \(S_2:=\{s\in S: |Y(s)-X(s)|<\epsilon\}\) and \(S_3=:\{s\in S: X(s)\leq a+\epsilon\}\) then since \(Y\leq a\) and \(|Y-X|<\epsilon\) implies \(X\leq a+\epsilon\), we have \(S_1\subset S_2^c\cup S_3\) and thus \(P(Y\leq a)\leq P(X\leq a+\epsilon)+P(|Y-X|>\epsilon)\). The lemma is proved.
Then for any fixed \(t\) at which the cdf is continuous and \(\epsilon>0\), it follows from the lemma that \[\begin{align} &P(X \leq t-\epsilon)\leq P(X_n\leq t)+P(|X_n-X|>\epsilon) \tag{2.22}\\ &P(X_n \leq t)\leq P(X\leq t+\epsilon)+P(|X_n-X|>\epsilon) \tag{2.23} \end{align}\] Therefore, \(P(X \leq t-\epsilon)-P(|X_n-X|>\epsilon)\leq P(X_n \leq t)\leq P(X\leq t+\epsilon)+P(|X_n-X|>\epsilon)\). Let \(n\to\infty\) we have \(P(X \leq t-\epsilon)\leq \lim_{n\to\infty}F_{X_n}(t)\leq P(X\leq t+\epsilon)\) holds for any \(\epsilon\). Since by assumption \(F_X(x)\) continuous at \(t\), we finally have \(\lim_{n\to\infty}F_{X_n}(x)=F_X(x)\) as we desired.One special case where the inverse of Theorem 2.6 is stated below.
Theorem 2.7 The sequence of random variables \(X_1,X_2,\cdots\) converges in probability to a constant \(\mu\) iff the sequence also converges in distribution to \(\mu\). That is \[\begin{equation} P(|X_n-\mu|>\epsilon)\to 0\quad\forall\epsilon>0\iff P(X_n\leq x)\to\left\{ \begin{aligned} &0 & if \, x<\mu \\ &1 & if \, x\geq\mu \end{aligned} \right. \tag{2.24} \end{equation}\]
(This is Exercise 5.41 on Casella and Berger (2002))Proof. \((\Longrightarrow)\) Set \(\epsilon=|x-\mu|>0\). If \(x>\mu\), then the set \(S_1:=\{s\in S: |X_n(s)-\mu|\leq\epsilon\}\) is contained in the set \(S_2:=\{s\in S: X_n(s)\leq x\}\). Therefore, \(1\geq P(X_n\leq x)\geq P(|X_n-\mu|\leq\epsilon)\to1\) as \(n\to\infty\). On the other hand, if \(x<\mu\), then the set of \(S_1^*:=\{s\in S: |X_n(s)-\mu|\geq\epsilon\}\) contains the set \(S_2\), which indicates \(0\leq P(X_n\leq x)\leq P(|X_n-\mu|\geq\epsilon)\to0\) as \(n\to\infty\). Hence, we have proved \(\Longrightarrow\) part.
\((\Longleftarrow)\) For any \(\epsilon>0\), it follows that \[\begin{equation} \begin{split} 0&\leq P(|X_n-\mu|>\epsilon)\\ &\leq P(X_n-\mu<-\epsilon)+P(X_n-\mu>\epsilon)\\ &=P(X_n<\mu-\epsilon)+P(X_n>\mu+\epsilon)\\ &=P(X_n<\mu-\epsilon)+1-P(X_n\leq\mu+\epsilon) \end{split} \tag{2.25} \end{equation}\] Since \(\epsilon>0\), we have as \(n\to\infty\), \(P(|X_n-\mu|>\epsilon)\to 0\) as we desired.Proof. Show by mgf, that is for \(|t|<h\), the mgf of \(\sqrt{n}(\bar{X}_n-\mu)/\sigma\) converges to \(e^{t^2/2}\), the mgf of standard normal random variable
Define \(Y_i=\frac{X_i-\mu}{\sigma}\) and let \(M_Y(t)\) denote the common mgf of \(Y_i\)s. Since \[\begin{equation} \frac{\sqrt{n}(\bar{X}_n-\mu)}{\sigma}=\frac{1}{\sqrt{n}}\sum_{i=1}^nY_i \tag{2.27} \end{equation}\] From the properties of mgfs, \[\begin{equation} \begin{split} M_{\sqrt{n}(\bar{X}_n-\mu)/\sigma}(t)&=M_{\sum_{i=1}^nY_i/\sqrt{n}}(t)\\ &=M_{\sum_{i=1}^nY_i}(\frac{t}{\sqrt{n}})\\ &=[M_Y(\frac{t}{\sqrt{n}})]^n \end{split} \tag{2.28} \end{equation}\] We now expand \(M_Y(t/\sqrt{n})\) in a Taylor series around 0. We have \[\begin{equation} M_Y(t/\sqrt{n})=\sum_{k=0}^{\infty}M_Y^{(k)}(0)\frac{(t/\sqrt{n})^k}{k!} \tag{2.29} \end{equation}\] where \(M_Y^{(k)}(0)=(d^k/dt^k)M_Y(t)|_{t=0}\). Using the fact that \(M_Y^{(0)}=1,EY=M_Y^{(1)}=0\) and \(Var(Y)=M_Y^{(2)}=1\), we have \[\begin{equation} M_Y(\frac{t}{\sqrt{n}})=1+\frac{(t/\sqrt{n})^2}{2!}+R_Y(\frac{t}{\sqrt{n}}) \tag{2.30} \end{equation}\] For fixed \(t\neq0\), \(R_Y(\frac{t}{\sqrt{n}})\) contains \(\frac{t}{\sqrt{n}}\) terms with order higher than 2, so \[\begin{equation} \lim_{n\to\infty}\frac{R_Y(\frac{t}{\sqrt{n}})}{(\frac{t}{\sqrt{n}})^2}=0 \tag{2.31} \end{equation}\] Since t is fixed, we also have \[\begin{equation} \lim_{n\to\infty}\frac{R_Y(\frac{t}{\sqrt{n}})}{(\frac{1}{\sqrt{n}})^2}=\lim_{n\to\infty}nR_Y(\frac{t}{\sqrt{n}})=0 \tag{2.32} \end{equation}\] which is also true at \(t=0\). Thus, for any fixed \(t\) we have \[\begin{equation} \begin{split} \lim_{n\to\infty}(M_Y(\frac{t}{\sqrt{n}}))^n&=\lim_{n\to\infty}[1+\frac{(t/\sqrt{n})^2}{2!}+R_Y(\frac{t}{\sqrt{n}})]^n\\ &=\lim_{n\to\infty}[1+\frac{1}{n}(\frac{t^2}{2}+nR_Y(\frac{t}{\sqrt{n}}))]^n &=e^{t^2/2} \end{split} \tag{2.33} \end{equation}\] as we desired.CLT describes the limiting distribution of sample mean. It can be shown that the only two required constrains are independence and finite variance, and it ends up with normality. CLT shows that we can use normal to approximate other distribution while the power of the approximation is case by case different.
We conclude this chapter by a useful theorem without proof.
Theorem 2.9 (Slutsky Theorem) If \(X_n\to X\) in distribution and \(Y_n\to a\), a is a constant in probability, then
\(Y_nX_n\to aX\) in distribution;
- \(X_n+Y_n\to X+a\) in distribution.
References
Casella, George, and Roger Berger. 2002. Statistical Inference. 2nd ed. Belmont, CA: Duxbury Resource Center.