Chapter 1 Random Samples, Special Distribution (Lecture on 01/07/2020)
Often, the data collected in an experiment consist of several observations on a variable of interest. Random sampling is the model for data collection that is often used to describe this situation.
The joint pdf or pmf of X1,⋯,Xn is given by
f(X1,⋯,Xn)=f(x1)f(x2)⋯f(xn)=n∏i=1f(xi)
Since X1,⋯,Xn are identically distributed, all the margianl densities f(x) are the same function. Furthermore, if the population pdf or pmf is a member of a parametric family, with pdf or pmf given by f(x|θ), then the joint pdf or pmf is
f(X1,⋯,Xn|θ)=n∏i=1f(xi|θ)
When a sample X1,⋯,Xn is drawn, some summary of the values is usually computed. Any well-defined summary may be expressed mathematically as a function T(X1,⋯,Xn) whose domain includes the sample space of the random vector (X1,⋯,Xn) . The function T may be real-valued or vector-valued; thus the summary is a random variable (or vector), Y=T(X1,⋯,Xn).
The only restriction for statistic is that it cannot be a function of parameters. Sample mean, variance and standard deviation are often used and provide good summaries of the sample.
Sample mean minimizes the total quadratic difference, i.e. minan∑i=1(xi−a)2=n∑i=1(xi−ˉx)2 (1.6) can be easily proved by the classic trick of adding ˉx and substracting ˉx inside the brackets. Then apply another classic characteristic of sample mean: n∑i=1(xi−ˉx)=0 Another useful property of sample mean and variance is: (n−1)s2=n∑i=1x2i−nˉx2
Proof. The first part of (1.9) can easily be shown by the linear property of expectation. To prove the second part, note that Var(n∑i=1g(Xi))=E[n∑i=1g(Xi)−E(n∑i=1g(Xi)]2=E[n∑i=1(g(Xi)−Eg(Xi))]2
Notice in (1.10) there are n terms of (g(Xi)−Eg(Xi))2,i=1,⋯,n, and each of them is just Var(g(X1)). The remaining terms are all of the form (g(Xi)−Eg(Xi))(g(Xj)−Eg(Xj)),i≠j, which is Cov(g(Xi),g(Xj))=0.Theorem 1.1 X1,⋯,Xn random sample from a population with mean μ and variance σ2<∞. Then
- EˉX=μ
- Var(ˉX)=σ2n
- ES2=σ2
Proof. For (a), let g(Xi)=Xi/n, so Eg(Xi)=μ/n, then apply Lemma 1.1.
For (b), Var(g(Xi))=σ2/n2, then by Lemma 1.1, Var(ˉX)=σ2n.
Finally for (c), we have ES2=E(1n−1[n∑i=1X2i−nˉX2])=1n−1(nEX21−nEˉX2)=1n−1(n(σ2+μ2)−n(σ2n+μ2))=σ2 where the last part use the fact that EY2=Var(Y)+(EY)2 for any random variable Y.Theorem 1.2 Let X1,⋯,Xn be a random sample from a population with pdf fX(x), and ˉX denote the sample mean, then despite whether the mgf of X exists, fˉX(x)=nfX1+⋯+Xn(nx)
Futhermore, if mgf of X does exist, denoted as MX(t), then MˉX(t)=[MX(tn)]n
(This theroem combines Exercise 5.5 and Theorem 5.2.7 on Casella and Berger (2002))For special distributions, the first and the most important one to be considered is the multivariate normal distribution (MVN for short).
Recall the definition of moment generating function and characteristic function of a random variable X is defined as (1.14) and (1.15), respectively. MX(t)=E(etTX) ΦX(t)=E(eitX) If X and Y are independent, then we have the following property for mgf and characteristic function MX+Y(t)=MX(t)⋅MY(t)ΦX+Y(t)=ΦX(t)⋅ΦY(t) Finally, the mgf and characteristic function of multivariate normally distributed r.v. X is given by MX(t)=exp(tTμ+12tTΣt)ΦX(t)=exp(itTμ−12tTΣt)
Theorem 1.3 Suppose X∼Np(μ,Σ), then for any matrix B∈Rk×p with rank k≤p, Y=BX, Y∼N(Bμ,BΣBT).
(This theroem is Theorem 4.4a on Rencher and Schaalje (2007))Proof. The mgf of Y is by definition
MY(t)=E(etTY)=E(etTBX)=MX(BTt) From (1.18) we have the form of MX(t), therefore MY(t)=exp(tTAμ+12tTAΣATt) Thus, the theorem is proved.Notice that chi-square distribution can be viewed as a Gamma distribution with shape parameter α=p2 and rate parameter β=12. Therefore, the mean and variance of X∼χ2p is α/β=p and α/β2=2p, respectively.
Lemma 1.2 If χ2p denote a chi squared r.v. with p degrees of freedom, then
If Z∼N(0,1), then Z2∼χ21
- If X1,⋯,Xn are independent and Xi∼χ2pi, then X1+⋯+Xn∼χ2p1+⋯+pn.
The proof is stright forward, from probability class.
We conclude this chapter by a theorem about the the properties of ˉX and S2 when we have additional normality assumption.
Theorem 1.4 Let X1,⋯,Xn be r.v. from a N(μ,σ2) distribution, and let ˉX and S2 be the sample mean and sample variance defined in (1.3) and (1.4), then
ˉX and S2 are independent.
ˉX has a N(μ,σ2/n) distribution.
- (n−1)S2/σ2 has a chi squared distribution with n-1 degrees of freedom.
Proof. We can assume, without loss of generality, that μ=0 and σ=1.
For (a), notice that S2 can be expressed as n-1 deviations
S2=1n−1n∑i=1(Xi−ˉX)2=1n−1((X1−ˉX)2+n∑i=2(Xi−ˉX)2)=1n−1((n∑i=2(Xi−ˉX))2+n∑i=2(Xi−ˉX)2)
where the last step uses the classic property of sample mean given in (1.7). Thus, S2 can be written as a function of only (X2−ˉX,⋯,Xn−ˉX). The joint pdf of sample is
f(x1,⋯,xn)=1(2π)n/2e−∑ni=1x2i2
Define variable transformtaion y1=ˉx,y2=x2−ˉx,⋯,yn=xn−ˉx, then determination of the Jacobian of this transformation is
|J|=|1−1−1⋯−1110⋯0101⋯0⋮⋮⋮⋮100⋯1|=n
The proof of (1.25) can be done by induction. Therefore, we have
f(y1,⋯,yn)=n(2π)n/2e−(y1−∑ni=2yi)22e−∑ni=2(yi+y1)22=[(n2π)1/2e−ny212][n1/2(2π)(n−1)/2e−∑ni=2y2i+(∑ni=2yi)22],−∞<yi<∞
Hence, the joint pdf factors and we have ˉX and S2 independent.
For (b), define B=1n(1,⋯,1), then ˉX=BX with X=(X1,⋯,Xn)T. By Theorem 1.3 we have (b) as desired.
Finally, for (c), proof by induction. Deonte the sample mean and variance on the first k observations as ˉXk and S2k. Therefore, (n−1)S2n=(n−2)S2n−1+(n−1n)(Xn−ˉXn−1)2 The proof (1.27) is shown in Exercise 1.1. First consider n=2, from (1.27) we have that S22=12(X2−X1)2 since X2−X1√2∼N(0,1), by property of chi-squared distribution in Lemma 1.2, S22∼χ21. Proceeding with the induction, we assume that for n=k, (k−1)S2k∼χ2k−1. For n=k+1, by (1.27) kS2k+1=(k−1)S2k+(kk+1)(Xk+1−ˉXk)2 By induction hypothesis, (k−1)S2k∼χ2k−1, we only need to show that (kk+1)(Xk+1−ˉXk)2∼χ21 and independent of S2k, then by Lemma 1.2 we will get the desired result.
Since the vector (Xk+1,ˉXk) is independent of S2k and so any function of the vector, especially (Xk+1−ˉXk)2. Furthermore, Xk+1−ˉXk∼N(0,k+1k), becuase Xk+1∼N(0,1) and Xk∼N(0,1k) and they are independent. Therefore (kk+1)(Xk+1−ˉXk)2∼χ21 as we desired.The following exercise shows the result that need to be used in proof by induction of Theorem 1.4, part(c).
Exercise 1.1 Show the following
ˉXn=Xn+(n−1)ˉXn−1n
(n−1)S2n=(n−2)S2n−1+(n−1n)(Xn−ˉXn−1)2
References
Casella, George, and Roger Berger. 2002. Statistical Inference. 2nd ed. Belmont, CA: Duxbury Resource Center.
Rencher, Alvin, and Bruce Schaalje. 2007. Linear Models in Statistics. 2nd ed. John Wiley; Sons, Ltd.