Chapter 2 Special Distribution, Order Statistics, Convergence (Lecture on 01/09/2020)
Continue from Chapter 1, there are another two important distributions, namely the Student’s t distribution and F distribution.
The intuition behind Student’s t distribution is we want the variability of ˉX as an estimate of μ in case of σ unknown. Suppose X1,⋯,Xn are random sample from N(μ,σ2), then from Theorem 1.4, ˉX−μσ/√n∼N(0,1), which can be used as a basis of inference. However, if σ is unknown, a natural idea is to consider using S to substitute it and consider ˉX−μS/√n. ˉX−μS/√n=(ˉX−μ)/(σ/√n)√S2/σ2 Noticing that the numerator of (2.1) is a N(0,1) r.v. and the denominator is, by Theorem 1.4, √χ2n−1/(n−1), independent of the numerator. This leads to the Student’s t distribution.
The t distribution has no mgf! It dose not have moments of all orders. If there are p degrees of freedom, then there are only p-1 moments. And we have the following property for tp distribution.
Proof. For the mean, using the definition of mean we have ETp=∫+∞−∞t⋅Γ(p−12)Γ(p2)1(pπ)1/21(1+t2/p)(p+1)/2dt Noticing that the integrant of (2.4) is a odd function, therefore, the integral is 0 when p>1.
As for the variance, noticing that Tp=U√V/p with independent U∼N(0,1) and V∼χ2p. Thus, Var(T2p)=E(T2p)=pE(U2)E(V−1)=pp−2,∀p>2 where we used the result that the expectation of inverse chi squared distribution with p degrees of freedom is 1p−2.For F distribution, the intuition behind is to compare the variability of populations of N(μ1,σ21) and N(μ2,σ22). The quantity of interest would be σ21σ22, whose information is contained in S21S22. The F distribution gives distribution on (2.6) that allows people to compare the two ratio. S21/S22σ21/σ22=S21/σ21S22/σ22 Noticing that from (2.6), F distribution is the ratio of two independent scaled chi squared random variables.
Theorem 2.1 (Properties of F Distribution) a. If X∼Fp,q, then 1/X∼Fq,p.
If X∼tq, then X2∼F1,q.
If X∼Fp,q, then (p/q)X1+(p/q)X∼Beta(p/2,q/2)
Proof. a. By definition, X=U/pV/q with independent U∼χ2p and V∼χ2q. Therefore, 1/X=V/pU/q follows Fq,p by definition.
By definition, X=U√V/q with independent U∼N(0,1) and V∼χ2q. Therefore, X2=U2/1V/q follows F1,q by definition.
- It can be done by using variable transformation.
Proof. For fixed i, let Y be a random variable counts the number of X_1,\cdots,X_n that are less than or equal to x_i. Then it follows that Y\sim Bin(n,P_i). The event \{X_{(j)}\leq x_i\} is equivalent to \{Y\geq j\}. (2.10) is just the binominal probability of P(Y\geq j)=P(X_{j}\leq x_i). Equation (2.11) is just difference \begin{equation} P(X_{(j)}=x_i)=P(X_{(j)}\leq x_i)-P(X_{(j)}\leq x_{i-1}) \tag{2.13} \end{equation} with exception for the case i=1 where P(X_{(j)}=x_1)=P(X_{(j)}\leq x_1).
For continuous case, Y\sim Bin(n,F_X(x)). Thus \begin{equation} F_{X_{(j)}}(x)=P(Y\geq j)=\sum_{k=j}^n{n \choose k}[F_X(x)]^k[1-F_X(x)]^{n-k} \tag{2.14} \end{equation} and the pdf of X_{(j)} is get by differentiate cdf \begin{equation} \begin{split} f_{X_{(j)}}(x)&=\frac{d}{dx}F_{X_{(j)}}(x)\\ &=\sum_{k=j}^n{n \choose k}(k[F_X(x)]^{k-1}[1-F_X(x)]^{n-k}f_X(x)\\ &-(n-k)[F_X(x)]^k[1-F_X(x)]^{n-k-1}f_X(x))\\ &={n \choose j}j[F_X(x)]^{j-1}[1-F_X(x)]^{n-j}f_X(x)\\ &+\sum_{k=j+1}^n{n \choose k}k[F_X(x)]^{k-1}[1-F_X(x)]^{n-k}f_X(x)\\ &-\sum_{k=j}^{n-1}{n \choose k}(n-k)[F_X(x)]^k[1-F_X(x)]^{n-k-1}f_X(x))\\ &=\frac{n!}{(j-1)!(n-j)!}f_X(x)[F_X(x)]^{j-1}[1-F_X(x)]^{n-j}\\ &+\sum_{k=j}^{n-1}{n \choose {k+1}}(k+1)[F_X(x)]^k[1-F_X(x)]^{n-k-1}f_X(x))\\ &-\sum_{k=j}^{n-1}{n \choose k}(n-k)[F_X(x)]^k[1-F_X(x)]^{n-k-1}f_X(x)) \end{split} \tag{2.15} \end{equation} Noting that \begin{equation} {n \choose {k+1}}(k+1)=\frac{n!}{k!(n-k-1)!}={n \choose k}(n-k) \tag{2.16} \end{equation} Thus, the last two term of (2.15) cancel out and it lefts with (2.12).Theorem 2.4 Suppose X_1,X_2,\cdots converges in probability to a random variable X and h is a continuous function. Then h(X_1),h(X_2),\cdots converges in probability to h(X).
(This is from Exercise 5.39 in Casella and Berger (2002))Theorem 2.6 If the sequence of random variables X_1,X_2,\cdots converges in probability to a random variable X, the sequence also converges in distribution to X.
(This is Exercise 5.40 on Casella and Berger (2002))Proof. Firstly we need to prove the following lemma. For any random variable X,Y on sample space S, let a be a real number and for any \epsilon>0, we have P(Y\leq a)\leq P(X\leq a+\epsilon)+P(|Y-X|>\epsilon). The proof is denote S_1:=\{s\in S:Y(s)\leq a\}, S_2:=\{s\in S: |Y(s)-X(s)|<\epsilon\} and S_3=:\{s\in S: X(s)\leq a+\epsilon\} then since Y\leq a and |Y-X|<\epsilon implies X\leq a+\epsilon, we have S_1\subset S_2^c\cup S_3 and thus P(Y\leq a)\leq P(X\leq a+\epsilon)+P(|Y-X|>\epsilon). The lemma is proved.
Then for any fixed t at which the cdf is continuous and \epsilon>0, it follows from the lemma that \begin{align} &P(X \leq t-\epsilon)\leq P(X_n\leq t)+P(|X_n-X|>\epsilon) \tag{2.22}\\ &P(X_n \leq t)\leq P(X\leq t+\epsilon)+P(|X_n-X|>\epsilon) \tag{2.23} \end{align} Therefore, P(X \leq t-\epsilon)-P(|X_n-X|>\epsilon)\leq P(X_n \leq t)\leq P(X\leq t+\epsilon)+P(|X_n-X|>\epsilon). Let n\to\infty we have P(X \leq t-\epsilon)\leq \lim_{n\to\infty}F_{X_n}(t)\leq P(X\leq t+\epsilon) holds for any \epsilon. Since by assumption F_X(x) continuous at t, we finally have \lim_{n\to\infty}F_{X_n}(x)=F_X(x) as we desired.One special case where the inverse of Theorem 2.6 is stated below.
Theorem 2.7 The sequence of random variables X_1,X_2,\cdots converges in probability to a constant \mu iff the sequence also converges in distribution to \mu. That is \begin{equation} P(|X_n-\mu|>\epsilon)\to 0\quad\forall\epsilon>0\iff P(X_n\leq x)\to\left\{ \begin{aligned} &0 & if \, x<\mu \\ &1 & if \, x\geq\mu \end{aligned} \right. \tag{2.24} \end{equation}
(This is Exercise 5.41 on Casella and Berger (2002))Proof. (\Longrightarrow) Set \epsilon=|x-\mu|>0. If x>\mu, then the set S_1:=\{s\in S: |X_n(s)-\mu|\leq\epsilon\} is contained in the set S_2:=\{s\in S: X_n(s)\leq x\}. Therefore, 1\geq P(X_n\leq x)\geq P(|X_n-\mu|\leq\epsilon)\to1 as n\to\infty. On the other hand, if x<\mu, then the set of S_1^*:=\{s\in S: |X_n(s)-\mu|\geq\epsilon\} contains the set S_2, which indicates 0\leq P(X_n\leq x)\leq P(|X_n-\mu|\geq\epsilon)\to0 as n\to\infty. Hence, we have proved \Longrightarrow part.
(\Longleftarrow) For any \epsilon>0, it follows that \begin{equation} \begin{split} 0&\leq P(|X_n-\mu|>\epsilon)\\ &\leq P(X_n-\mu<-\epsilon)+P(X_n-\mu>\epsilon)\\ &=P(X_n<\mu-\epsilon)+P(X_n>\mu+\epsilon)\\ &=P(X_n<\mu-\epsilon)+1-P(X_n\leq\mu+\epsilon) \end{split} \tag{2.25} \end{equation} Since \epsilon>0, we have as n\to\infty, P(|X_n-\mu|>\epsilon)\to 0 as we desired.Proof. Show by mgf, that is for |t|<h, the mgf of \sqrt{n}(\bar{X}_n-\mu)/\sigma converges to e^{t^2/2}, the mgf of standard normal random variable
Define Y_i=\frac{X_i-\mu}{\sigma} and let M_Y(t) denote the common mgf of Y_is. Since \begin{equation} \frac{\sqrt{n}(\bar{X}_n-\mu)}{\sigma}=\frac{1}{\sqrt{n}}\sum_{i=1}^nY_i \tag{2.27} \end{equation} From the properties of mgfs, \begin{equation} \begin{split} M_{\sqrt{n}(\bar{X}_n-\mu)/\sigma}(t)&=M_{\sum_{i=1}^nY_i/\sqrt{n}}(t)\\ &=M_{\sum_{i=1}^nY_i}(\frac{t}{\sqrt{n}})\\ &=[M_Y(\frac{t}{\sqrt{n}})]^n \end{split} \tag{2.28} \end{equation} We now expand M_Y(t/\sqrt{n}) in a Taylor series around 0. We have \begin{equation} M_Y(t/\sqrt{n})=\sum_{k=0}^{\infty}M_Y^{(k)}(0)\frac{(t/\sqrt{n})^k}{k!} \tag{2.29} \end{equation} where M_Y^{(k)}(0)=(d^k/dt^k)M_Y(t)|_{t=0}. Using the fact that M_Y^{(0)}=1,EY=M_Y^{(1)}=0 and Var(Y)=M_Y^{(2)}=1, we have \begin{equation} M_Y(\frac{t}{\sqrt{n}})=1+\frac{(t/\sqrt{n})^2}{2!}+R_Y(\frac{t}{\sqrt{n}}) \tag{2.30} \end{equation} For fixed t\neq0, R_Y(\frac{t}{\sqrt{n}}) contains \frac{t}{\sqrt{n}} terms with order higher than 2, so \begin{equation} \lim_{n\to\infty}\frac{R_Y(\frac{t}{\sqrt{n}})}{(\frac{t}{\sqrt{n}})^2}=0 \tag{2.31} \end{equation} Since t is fixed, we also have \begin{equation} \lim_{n\to\infty}\frac{R_Y(\frac{t}{\sqrt{n}})}{(\frac{1}{\sqrt{n}})^2}=\lim_{n\to\infty}nR_Y(\frac{t}{\sqrt{n}})=0 \tag{2.32} \end{equation} which is also true at t=0. Thus, for any fixed t we have \begin{equation} \begin{split} \lim_{n\to\infty}(M_Y(\frac{t}{\sqrt{n}}))^n&=\lim_{n\to\infty}[1+\frac{(t/\sqrt{n})^2}{2!}+R_Y(\frac{t}{\sqrt{n}})]^n\\ &=\lim_{n\to\infty}[1+\frac{1}{n}(\frac{t^2}{2}+nR_Y(\frac{t}{\sqrt{n}}))]^n &=e^{t^2/2} \end{split} \tag{2.33} \end{equation} as we desired.CLT describes the limiting distribution of sample mean. It can be shown that the only two required constrains are independence and finite variance, and it ends up with normality. CLT shows that we can use normal to approximate other distribution while the power of the approximation is case by case different.
We conclude this chapter by a useful theorem without proof.
Theorem 2.9 (Slutsky Theorem) If X_n\to X in distribution and Y_n\to a, a is a constant in probability, then
Y_nX_n\to aX in distribution;
- X_n+Y_n\to X+a in distribution.
References
Casella, George, and Roger Berger. 2002. Statistical Inference. 2nd ed. Belmont, CA: Duxbury Resource Center.