Chapter 1 Random Samples, Special Distribution (Lecture on 01/07/2020)
Often, the data collected in an experiment consist of several observations on a variable of interest. Random sampling is the model for data collection that is often used to describe this situation.
The joint pdf or pmf of \(X_1,\cdots,X_n\) is given by
\[\begin{equation} f(X_1,\cdots,X_n)=f(x_1)f(x_2)\cdots f(x_n)=\prod_{i=1}^nf(x_i) \tag{1.1} \end{equation}\]
Since \(X_1,\cdots,X_n\) are identically distributed, all the margianl densities \(f(x)\) are the same function. Furthermore, if the population pdf or pmf is a member of a parametric family, with pdf or pmf given by \(f(x|\theta)\), then the joint pdf or pmf is
\[\begin{equation} f(X_1,\cdots,X_n|\theta)=\prod_{i=1}^nf(x_i|\theta) \tag{1.2} \end{equation}\]
When a sample \(X_1,\cdots,X_n\) is drawn, some summary of the values is usually computed. Any well-defined summary may be expressed mathematically as a function \(T(X_1,\cdots,X_n)\) whose domain includes the sample space of the random vector \((X_1,\cdots,X_n)\) . The function T may be real-valued or vector-valued; thus the summary is a random variable (or vector), \(Y=T(X_1,\cdots,X_n)\).
The only restriction for statistic is that it cannot be a function of parameters. Sample mean, variance and standard deviation are often used and provide good summaries of the sample.
Sample mean minimizes the total quadratic difference, i.e. \[\begin{equation} min_{a}\sum_{i=1}^n(x_i-a)^2=\sum_{i=1}^n(x_i-\bar{x})^2 \tag{1.6} \end{equation}\] (1.6) can be easily proved by the classic trick of adding \(\bar{x}\) and substracting \(\bar{x}\) inside the brackets. Then apply another classic characteristic of sample mean: \[\begin{equation} \sum_{i=1}^n(x_i-\bar{x})=0 \tag{1.7} \end{equation}\] Another useful property of sample mean and variance is: \[\begin{equation} (n-1)s^2=\sum_{i=1}^nx_i^2-n\bar{x}^2 \tag{1.8} \end{equation}\]
Proof. The first part of (1.9) can easily be shown by the linear property of expectation. To prove the second part, note that \[\begin{equation} \begin{split} Var(\sum_{i=1}^n g(X_i))&=E[\sum_{i=1}^n g(X_i)-E(\sum_{i=1}^n g(X_i)]^2\\ &=E[\sum_{i=1}^n(g(X_i)-E g(X_i))]^2 \end{split} \tag{1.10} \end{equation}\]
Notice in (1.10) there are n terms of \((g(X_i)-E g(X_i))^2, i=1,\cdots,n\), and each of them is just \(Var(g(X_1))\). The remaining terms are all of the form \((g(X_i)-E g(X_i))(g(X_j)-E g(X_j)), i\neq j\), which is \(Cov(g(X_i),g(X_j))=0\).Theorem 1.1 \(X_1,\cdots,X_n\) random sample from a population with mean \(\mu\) and variance \(\sigma^2<\infty\). Then
- \(E\bar{X}=\mu\)
- \(Var(\bar{X})=\frac{\sigma^2}{n}\)
- \(ES^2=\sigma^2\)
Proof. For (a), let \(g(X_i)=X_i/n\), so \(Eg(X_i)=\mu/n\), then apply Lemma 1.1.
For (b), \(Var(g(X_i))=\sigma^2/n^2\), then by Lemma 1.1, \(Var(\bar{X})=\frac{\sigma^2}{n}\).
Finally for (c), we have \[\begin{equation} \begin{split} ES^2&=E(\frac{1}{n-1}[\sum_{i=1}^nX_i^2-n\bar{X}^2])\\ &=\frac{1}{n-1}(nEX_1^2-nE\bar{X}^2)\\ &=\frac{1}{n-1}(n(\sigma^2+\mu^2)-n(\frac{\sigma^2}{n}+\mu^2))=\sigma^2 \end{split} \end{equation}\] where the last part use the fact that \(EY^2=Var(Y)+(EY)^2\) for any random variable Y.Theorem 1.2 Let \(X_1,\cdots,X_n\) be a random sample from a population with pdf \(f_X(x)\), and \(\bar{X}\) denote the sample mean, then despite whether the mgf of X exists, \[\begin{equation} f_{\bar{X}}(x)=nf_{X_1+\cdots+X_n}(nx) \tag{1.11} \end{equation}\]
Futhermore, if mgf of X does exist, denoted as \(M_X(t)\), then \[\begin{equation} M_{\bar{X}}(t)=[M_X(\frac{t}{n})]^n \tag{1.12} \end{equation}\]
(This theroem combines Exercise 5.5 and Theorem 5.2.7 on Casella and Berger (2002))For special distributions, the first and the most important one to be considered is the multivariate normal distribution (MVN for short).
Recall the definition of moment generating function and characteristic function of a random variable X is defined as (1.14) and (1.15), respectively. \[\begin{equation} M_X(\mathbf{t})=E(e^{\mathbf{t}^TX}) \tag{1.14} \end{equation}\] \[\begin{equation} \Phi_X(\mathbf{t})=E(e^{i\mathbf{t}X}) \tag{1.15} \end{equation}\] If X and Y are independent, then we have the following property for mgf and characteristic function \[\begin{align} &M_{X+Y}(\mathbf{t})=M_X(\mathbf{t}) \cdot M_Y(\mathbf{t}) \tag{1.16} \\ &\Phi_{X+Y}(\mathbf{t})=\Phi_X(\mathbf{t}) \cdot \Phi_Y(\mathbf{t}) \tag{1.17} \end{align}\] Finally, the mgf and characteristic function of multivariate normally distributed r.v. \(X\) is given by \[\begin{align} &M_{X}(\mathbf{t})=exp(\mathbf{t}^T\mathbf{\mu}+\frac{1}{2}\mathbf{t}^T\Sigma\mathbf{t}) \tag{1.18} \\ &\Phi_{X}(\mathbf{t})=exp(i\mathbf{t}^T\mathbf{\mu}-\frac{1}{2}\mathbf{t}^T\Sigma\mathbf{t}) \tag{1.19} \end{align}\]
Theorem 1.3 Suppose \(X\sim N_p(\mathbf{\mu},\Sigma)\), then for any matrix \(B\in\mathbb{R}_{k\times p}\) with rank \(k\leq p\), \(Y=BX\), \(Y\sim N(B\mathbf{\mu},B\Sigma B^T)\).
(This theroem is Theorem 4.4a on Rencher and Schaalje (2007))Proof. The mgf of \(Y\) is by definition
\[\begin{equation} M_Y(t)=E(e^{\mathbf{t}^TY})=E(e^{\mathbf{t}^TBX})=M_X(B^T\mathbf{t}) \tag{1.20} \end{equation}\] From (1.18) we have the form of \(M_X(t)\), therefore \[\begin{equation} M_Y(t)=exp(\mathbf{t}^TA\mathbf{\mu}+\frac{1}{2}\mathbf{t}^TA\Sigma A^T\mathbf{t}) \tag{1.21} \end{equation}\] Thus, the theorem is proved.Notice that chi-square distribution can be viewed as a Gamma distribution with shape parameter \(\alpha=\frac{p}{2}\) and rate parameter \(\beta=\frac{1}{2}\). Therefore, the mean and variance of \(X\sim\chi_p^2\) is \(\alpha/\beta=p\) and \(\alpha/\beta^2=2p\), respectively.
Lemma 1.2 If \(\chi_p^2\) denote a chi squared r.v. with p degrees of freedom, then
If \(Z\sim N(0,1)\), then \(Z^2\sim\chi_1^2\)
- If \(X_1,\cdots,X_n\) are independent and \(X_i\sim\chi_{p_i}^2\), then \(X_1+\cdots+X_n\sim\chi_{p_1+\cdots+p_n}^2\).
The proof is stright forward, from probability class.
We conclude this chapter by a theorem about the the properties of \(\bar{X}\) and \(S^2\) when we have additional normality assumption.
Theorem 1.4 Let \(X_1,\cdots,X_n\) be r.v. from a \(N(\mu,\sigma^2)\) distribution, and let \(\bar{X}\) and \(S^2\) be the sample mean and sample variance defined in (1.3) and (1.4), then
\(\bar{X}\) and \(S^2\) are independent.
\(\bar{X}\) has a \(N(\mu,\sigma^2/n)\) distribution.
- \((n-1)S^2/\sigma^2\) has a chi squared distribution with n-1 degrees of freedom.
Proof. We can assume, without loss of generality, that \(\mu=0\) and \(\sigma=1\).
For (a), notice that \(S^2\) can be expressed as n-1 deviations
\[\begin{equation}
\begin{split}
S^2&=\frac{1}{n-1}\sum_{i=1}^n(X_i-\bar{X})^2\\
&=\frac{1}{n-1}((X_1-\bar{X})^2+\sum_{i=2}^n(X_i-\bar{X})^2)\\
&=\frac{1}{n-1}((\sum_{i=2}^n(X_i-\bar{X}))^2+\sum_{i=2}^n(X_i-\bar{X})^2)
\end{split}
\tag{1.23}
\end{equation}\]
where the last step uses the classic property of sample mean given in (1.7). Thus, \(S^2\) can be written as a function of only \((X_2-\bar{X},\cdots,X_n-\bar{X})\). The joint pdf of sample is
\[\begin{equation}
f(x_1,\cdots,x_n)=\frac{1}{(2\pi)^{n/2}}e^{-\frac{\sum_{i=1}^nx_i^2}{2}}
\tag{1.24}
\end{equation}\]
Define variable transformtaion \(y_1=\bar{x},y_2=x_2-\bar{x},\cdots,y_n=x_n-\bar{x}\), then determination of the Jacobian of this transformation is
\[\begin{equation}
|J|=\begin{vmatrix}1 & -1 & -1 & \cdots &-1\\
1 & 1 & 0 & \cdots & 0\\
1 & 0 & 1 & \cdots & 0\\
\vdots & \vdots & \vdots & & \vdots\\
1 & 0 & 0 & \cdots & 1
\end{vmatrix}=n
\tag{1.25}
\end{equation}\]
The proof of (1.25) can be done by induction. Therefore, we have
\[\begin{equation}
\begin{split}
f(y_1,\cdots,y_n)&=\frac{n}{(2\pi)^{n/2}}e^{-\frac{(y_1-\sum_{i=2}^ny_i)^2}{2}}e^{-\frac{\sum_{i=2}^n(y_i+y_1)^2}{2}}\\
&=[(\frac{n}{2\pi})^{1/2}e^{-\frac{ny_1^2}{2}}][\frac{n^{1/2}}{(2\pi)^{(n-1)/2}}e^{-\frac{\sum_{i=2}^ny_i^2+(\sum_{i=2}^ny_i)^2}{2}}], \quad -\infty<y_i<\infty
\end{split}
\tag{1.26}
\end{equation}\]
Hence, the joint pdf factors and we have \(\bar{X}\) and \(S^2\) independent.
For (b), define \(B=\frac{1}{n}(1,\cdots,1)\), then \(\bar{X}=B\mathbf{X}\) with \(\mathbf{X}=(X_1,\cdots,X_n)^T\). By Theorem 1.3 we have (b) as desired.
Finally, for (c), proof by induction. Deonte the sample mean and variance on the first k observations as \(\bar{X}_k\) and \(S^2_k\). Therefore, \[\begin{equation} (n-1)S_n^2=(n-2)S_{n-1}^2+(\frac{n-1}{n})(X_n-\bar{X}_{n-1})^2 \tag{1.27} \end{equation}\] The proof (1.27) is shown in Exercise 1.1. First consider \(n=2\), from (1.27) we have that \[\begin{equation} S_2^2=\frac{1}{2}(X_2-X_1)^2 \tag{1.28} \end{equation}\] since \(\frac{X_2-X_1}{\sqrt{2}}\sim N(0,1)\), by property of chi-squared distribution in Lemma 1.2, \(S_2^2\sim\chi_1^2\). Proceeding with the induction, we assume that for \(n=k\), \((k-1)S_k^2\sim\chi_{k-1}^2\). For \(n=k+1\), by (1.27) \[\begin{equation} kS_{k+1}^2=(k-1)S_k^2+(\frac{k}{k+1})(X_{k+1}-\bar{X}_k)^2 \tag{1.29} \end{equation}\] By induction hypothesis, \((k-1)S_k^2\sim\chi_{k-1}^2\), we only need to show that \((\frac{k}{k+1})(X_{k+1}-\bar{X}_k)^2\sim\chi_1^2\) and independent of \(S_k^2\), then by Lemma 1.2 we will get the desired result.
Since the vector \((X_{k+1},\bar{X}_k)\) is independent of \(S_k^2\) and so any function of the vector, especially \((X_{k+1}-\bar{X}_k)^2\). Furthermore, \(X_{k+1}-\bar{X}_k\sim N(0,\frac{k+1}{k})\), becuase \(X_{k+1}\sim N(0,1)\) and \(X_k\sim N(0,\frac{1}{k})\) and they are independent. Therefore \((\frac{k}{k+1})(X_{k+1}-\bar{X}_k)^2\sim\chi_1^2\) as we desired.The following exercise shows the result that need to be used in proof by induction of Theorem 1.4, part(c).
Exercise 1.1 Show the following
\(\bar{X}_n=\frac{X_{n}+(n-1)\bar{X}_{n-1}}{n}\)
\((n-1)S^2_n=(n-2)S_{n-1}^2+(\frac{n-1}{n})(X_n-\bar{X}_{n-1})^2\)
References
Casella, George, and Roger Berger. 2002. Statistical Inference. 2nd ed. Belmont, CA: Duxbury Resource Center.
Rencher, Alvin, and Bruce Schaalje. 2007. Linear Models in Statistics. 2nd ed. John Wiley; Sons, Ltd.