Chapter 1 Random Samples, Special Distribution (Lecture on 01/07/2020)

Often, the data collected in an experiment consist of several observations on a variable of interest. Random sampling is the model for data collection that is often used to describe this situation.

Definition 1.1 (Random Sample) The random variables X1,,Xn are called a random sample of size n from the population f(x) if X1,,Xn are mutually independent random variables and the marginal pdf or pmf of each Xi is the same function f(x). Alterantively, X1,,Xn are called independent and identically distributed random variables with pdf or pmf f(x). This is commonly abbreviated to i.i.d. random variables.

The joint pdf or pmf of X1,,Xn is given by

f(X1,,Xn)=f(x1)f(x2)f(xn)=ni=1f(xi)

Since X1,,Xn are identically distributed, all the margianl densities f(x) are the same function. Furthermore, if the population pdf or pmf is a member of a parametric family, with pdf or pmf given by f(x|θ), then the joint pdf or pmf is

f(X1,,Xn|θ)=ni=1f(xi|θ)

Random sample is also refered to infinite population sampling. If you sample X1,,Xn sequentially, the independent assumption indicates that the observed result x1 of X1 will not influence the observed result x2 of X2. “Removing” x1 from a infinite population does not change the population.
When sampling is from a finite population, it might be or might not be relevant to random sample. If it is sampled with replacement, then it is random sample. If it is sampled without replacement, then it is not random sample because it violates the independent assumption in definition 1.1. Beacuse p(X2=y|X1=y)=0 while p(X2=y|X1=x)0 which means the population distribution of X2 dose depend on the value of X1. However, X1,,Xn are identically distributed, which can be proved by law of total probability. This kind of sampling is sometimes called simple random sampling. If the population size N is larger compared to sample size n, then samples are nearly independent and probability can be approximated by assuming independence.

When a sample X1,,Xn is drawn, some summary of the values is usually computed. Any well-defined summary may be expressed mathematically as a function T(X1,,Xn) whose domain includes the sample space of the random vector (X1,,Xn) . The function T may be real-valued or vector-valued; thus the summary is a random variable (or vector), Y=T(X1,,Xn).

Definition 1.2 (Statistic) Let X1,,Xn be a random sample of size n from the population and let T(X1,,Xn) be a real-valued or vector-valued function whose domain includes the sample space of (X1,,Xn). Then the random variable or random vector Y=T(X1,,Xn) is called a statistic. The probability distribution of a statistic Y is called the sampling distribution of Y.

The only restriction for statistic is that it cannot be a function of parameters. Sample mean, variance and standard deviation are often used and provide good summaries of the sample.

Definition 1.3 (Sample Mean) The sample mean is defined as ˉX=X1++Xnn=1nni=1Xi
Definition 1.4 (Sameple Variance and Standard Deviation) The sample variance is defined as S2=1n1ni=1(XiˉX)2 And sample standard deviation is defined as S=S2

Sample mean minimizes the total quadratic difference, i.e. minani=1(xia)2=ni=1(xiˉx)2 (1.6) can be easily proved by the classic trick of adding ˉx and substracting ˉx inside the brackets. Then apply another classic characteristic of sample mean: ni=1(xiˉx)=0 Another useful property of sample mean and variance is: (n1)s2=ni=1x2inˉx2

Lemma 1.1 X1,,Xn be a random sample form a population, g(x) be a function such that E(g(X1)) and Var(g(X1)) exist. Then E(ni=1g(Xi))=n(E(g(X1)))Var(ni=1g(Xi))=n(Var(g(X1)))

Proof. The first part of (1.9) can easily be shown by the linear property of expectation. To prove the second part, note that Var(ni=1g(Xi))=E[ni=1g(Xi)E(ni=1g(Xi)]2=E[ni=1(g(Xi)Eg(Xi))]2

Notice in (1.10) there are n terms of (g(Xi)Eg(Xi))2,i=1,,n, and each of them is just Var(g(X1)). The remaining terms are all of the form (g(Xi)Eg(Xi))(g(Xj)Eg(Xj)),ij, which is Cov(g(Xi),g(Xj))=0.

Theorem 1.1 X1,,Xn random sample from a population with mean μ and variance σ2<. Then

    1. EˉX=μ
    1. Var(ˉX)=σ2n
    1. ES2=σ2
(Sample mean and variance are unbiased estimator!)

Proof. For (a), let g(Xi)=Xi/n, so Eg(Xi)=μ/n, then apply Lemma 1.1.

For (b), Var(g(Xi))=σ2/n2, then by Lemma 1.1, Var(ˉX)=σ2n.

Finally for (c), we have ES2=E(1n1[ni=1X2inˉX2])=1n1(nEX21nEˉX2)=1n1(n(σ2+μ2)n(σ2n+μ2))=σ2 where the last part use the fact that EY2=Var(Y)+(EY)2 for any random variable Y.
From (a) and (c) of Theorem 1.1, sample mean and sample variance is unbiased estimator of population mean and variance.

Theorem 1.2 Let X1,,Xn be a random sample from a population with pdf fX(x), and ˉX denote the sample mean, then despite whether the mgf of X exists, fˉX(x)=nfX1++Xn(nx)

Futhermore, if mgf of X does exist, denoted as MX(t), then MˉX(t)=[MX(tn)]n

(This theroem combines Exercise 5.5 and Theorem 5.2.7 on Casella and Berger (2002))
Proof. Let Y=X1++Xn, then Y=nˉX, and fX(x)=fY(nx)|dYdX|=nfY(nx) For mgfs MˉX(t)=EetˉX=Eet(X1++Xn)/n=Ee(t/n)Y=[MX(tn)]n where the last step uses the i.i.d. property of random samples.
Convolution Formula is useful in finding pdf of ˉX. If X and Y are independent random variables with pdfs fX(x) and fY(y), then the pdf of Z=X+Y is fZ(z)=+fX(ω)fY(zω)dω

For special distributions, the first and the most important one to be considered is the multivariate normal distribution (MVN for short).

Definition 1.5 (Multivariate Normal Distribution) Let μRp and Σp×p positive definite. A random vector XRp has a p-variate normal distribution with mean μ and covariance matrix Σ if it has pdf f(x)=|2πΣ|12exp[12(xμ)TΣ1(xμ)] for XRp, and will be denoted as XNp(μ,Σ).

Recall the definition of moment generating function and characteristic function of a random variable X is defined as (1.14) and (1.15), respectively. MX(t)=E(etTX) ΦX(t)=E(eitX) If X and Y are independent, then we have the following property for mgf and characteristic function MX+Y(t)=MX(t)MY(t)ΦX+Y(t)=ΦX(t)ΦY(t) Finally, the mgf and characteristic function of multivariate normally distributed r.v. X is given by MX(t)=exp(tTμ+12tTΣt)ΦX(t)=exp(itTμ12tTΣt)

Theorem 1.3 Suppose XNp(μ,Σ), then for any matrix BRk×p with rank kp, Y=BX, YN(Bμ,BΣBT).

(This theroem is Theorem 4.4a on Rencher and Schaalje (2007))

Proof. The mgf of Y is by definition

MY(t)=E(etTY)=E(etTBX)=MX(BTt) From (1.18) we have the form of MX(t), therefore MY(t)=exp(tTAμ+12tTAΣATt) Thus, the theorem is proved.
Definition 1.6 (Chi-square distribution) Another important special distribution is chi-square distribution, whose pdf is given by f(x)=1Γ(p/2)2p/2xp21ex/2 p is called the degree of freedom. Chi-square distribution with degree of freedom p is the sum of squares of p independent standard normal random variables.

Notice that chi-square distribution can be viewed as a Gamma distribution with shape parameter α=p2 and rate parameter β=12. Therefore, the mean and variance of Xχ2p is α/β=p and α/β2=2p, respectively.

Lemma 1.2 If χ2p denote a chi squared r.v. with p degrees of freedom, then

  1. If ZN(0,1), then Z2χ21

  2. If X1,,Xn are independent and Xiχ2pi, then X1++Xnχ2p1++pn.

The proof is stright forward, from probability class.

We conclude this chapter by a theorem about the the properties of ˉX and S2 when we have additional normality assumption.

Theorem 1.4 Let X1,,Xn be r.v. from a N(μ,σ2) distribution, and let ˉX and S2 be the sample mean and sample variance defined in (1.3) and (1.4), then

  1. ˉX and S2 are independent.

  2. ˉX has a N(μ,σ2/n) distribution.

  3. (n1)S2/σ2 has a chi squared distribution with n-1 degrees of freedom.

Proof. We can assume, without loss of generality, that μ=0 and σ=1.

For (a), notice that S2 can be expressed as n-1 deviations S2=1n1ni=1(XiˉX)2=1n1((X1ˉX)2+ni=2(XiˉX)2)=1n1((ni=2(XiˉX))2+ni=2(XiˉX)2) where the last step uses the classic property of sample mean given in (1.7). Thus, S2 can be written as a function of only (X2ˉX,,XnˉX). The joint pdf of sample is f(x1,,xn)=1(2π)n/2eni=1x2i2 Define variable transformtaion y1=ˉx,y2=x2ˉx,,yn=xnˉx, then determination of the Jacobian of this transformation is |J|=|1111110010101001|=n
The proof of (1.25) can be done by induction. Therefore, we have f(y1,,yn)=n(2π)n/2e(y1ni=2yi)22eni=2(yi+y1)22=[(n2π)1/2eny212][n1/2(2π)(n1)/2eni=2y2i+(ni=2yi)22],<yi< Hence, the joint pdf factors and we have ˉX and S2 independent.

For (b), define B=1n(1,,1), then ˉX=BX with X=(X1,,Xn)T. By Theorem 1.3 we have (b) as desired.

Finally, for (c), proof by induction. Deonte the sample mean and variance on the first k observations as ˉXk and S2k. Therefore, (n1)S2n=(n2)S2n1+(n1n)(XnˉXn1)2 The proof (1.27) is shown in Exercise 1.1. First consider n=2, from (1.27) we have that S22=12(X2X1)2 since X2X12N(0,1), by property of chi-squared distribution in Lemma 1.2, S22χ21. Proceeding with the induction, we assume that for n=k, (k1)S2kχ2k1. For n=k+1, by (1.27) kS2k+1=(k1)S2k+(kk+1)(Xk+1ˉXk)2 By induction hypothesis, (k1)S2kχ2k1, we only need to show that (kk+1)(Xk+1ˉXk)2χ21 and independent of S2k, then by Lemma 1.2 we will get the desired result.

Since the vector (Xk+1,ˉXk) is independent of S2k and so any function of the vector, especially (Xk+1ˉXk)2. Furthermore, Xk+1ˉXkN(0,k+1k), becuase Xk+1N(0,1) and XkN(0,1k) and they are independent. Therefore (kk+1)(Xk+1ˉXk)2χ21 as we desired.

The following exercise shows the result that need to be used in proof by induction of Theorem 1.4, part(c).

Exercise 1.1 Show the following

  1. ˉXn=Xn+(n1)ˉXn1n

  2. (n1)S2n=(n2)S2n1+(n1n)(XnˉXn1)2

(This problem is revised from Exercise 5.15 from Casella and Berger (2002))
Proof. By definition of sample mean, it is straight forward that ˉXn=ni=1Xin=Xn+(n1)ˉXn1n For (b), we have (n2)S2n1+(n1n)(XnˉXn1)2=n1i=1(XiˉXn1)2+n1n(XnˉXn1)2=n1i=1(XiˉXn+ˉXnˉXn1)2+n1n(XnˉXn1)2=n1i=1(XiˉXn)2+(n1)(ˉXnˉXn1)2+2(ˉXnˉXn1)n1i=1(XiˉXn)+n1n(XnˉXn1)2 To get (1.31), we need the following simple results ˉXnˉXn1=XnˉXnn1n1i=1(XiˉXn)=ˉXnXnXnˉXn1=n(XnˉXn)n1 Substitute (1.32) into (1.31) we obtain (n2)S2n1+(n1n)(XnˉXn1)2=n1i=1(XiˉXn)2+(n1)×(XnˉXn)2(n1)2+2(XnˉXnn1)×(ˉXnXn)+n1n×(nn1)2×(XnˉXn)2=n1i=1(XiˉXn)2+(1n12n1+nn1)(XnˉXn)2=ni=1(XiˉXn)2=(n1)S2n

References

Casella, George, and Roger Berger. 2002. Statistical Inference. 2nd ed. Belmont, CA: Duxbury Resource Center.

Rencher, Alvin, and Bruce Schaalje. 2007. Linear Models in Statistics. 2nd ed. John Wiley; Sons, Ltd.