Chapter 7 Central Limit Theorem and law of large numbers

7.1 Introduction

In this Section we will show why the Normal distribution, introduced in Section 5.7, is so important in probability and statistics. The central limit theorem states that under very weak conditions (almost all probability distributions you will see will satisfy them) the sum of nn i.i.d. random variables, SnSn, will converge, appropriately normalised to a standard Normal distribution as nn. For finite, but large n(50)n(50), we can approximate SnSn by a normal distribution and the normal distribution approximation can be used to answer questions concerning SnSn. In Section 7.2 we present the Central Limit Theorem and apply it to an example using exponential random variables. In Section 7.3 we explore how a continuous distribution (the Normal distribution) can be used to approximate sums of discrete distributions. Finally, in Section 7.4, we present the Law of Large Numbers which states that the uncertainty in the sample mean of nn observations, Sn/nSn/n, decreases as nn increases and converges to the population mean μμ. Both the Central Limit Theorem and the Law of Large Numbers will be important moving forward when considering statistical questions.

7.2 Statement of Central Limit Theorem

Before stating the Central Limit Theorem, we introduce some notation.

Convergence in distribution

A sequence of random variables Y1,Y2,Y1,Y2, are said to converge in distribution to a random variable YY, if for all yRyR,
P(Yny)P(Yy) as n.P(Yny)P(Yy) as n.

We write YnDYYnD−−−−Y as nn.

Central Limit Theorem.
Let X1,X2,,XnX1,X2,,Xn be independent and identically distributed random variables (i.e. a random sample) with finite mean μμ and variance σ2σ2. Let Sn=X1++XnSn=X1++Xn. Then
SnnμσnDN(0,1).SnnμσnD−−−−N(0,1).
The central limit theorem is equivalent to
ˉXμσ/nDN(0,1).¯Xμσ/nD−−−−N(0,1).

where ˉX=1nni=1Xi¯X=1nni=1Xi is the mean of the distributions X1,X2,,XnX1,X2,,Xn.

Therefore, we have that for large nn,
SnN(nμ,nσ2)SnN(nμ,nσ2)
and
ˉXN(μ,σ2n).¯XN(μ,σ2n).

Suppose X1,X2,,X100X1,X2,,X100 are i.i.d. exponential random variables with parameter λ=4λ=4.

  1. Find P(S100>30)P(S100>30).
  2. Find limits within which ˉX¯X will lie with probability 0.950.95.
  1. Since X1,X2,,X100X1,X2,,X100 are i.i.d. exponential random variables with parameter λ=4λ=4, E[Xi]=14E[Xi]=14 and var(Xi)=116var(Xi)=116. Hence,

    E[S100]=10014=25;var(S100)=100116=254.E[S100]=10014=25;var(S100)=100116=254.

    Since n=100n=100 is sufficiently big, S100S100 is approximately normally distributed by the central limit theorem (CLT). Therefore,

    P(S100>30)=P(S10025254>3025254)P(N(0,1)>2)=1P(N(0,1)2)=0.0228.P(S100>30)=P⎜ ⎜S10025254>3025254⎟ ⎟P(N(0,1)>2)=1P(N(0,1)2)=0.0228.

    Given that S100=100i=1XiGamma(100,4)S100=100i=1XiGamma(100,4), see Section 5.6.2, we can compute exactly P(S100>30)=0.0279P(S100>30)=0.0279, which shows that the central limit theorem gives a reasonable approximation.

  2. Since X1,X2,,X100X1,X2,,X100 are i.i.d. exponential random variables with parameter λ=4λ=4, E[Xi]=14E[Xi]=14 and var(Xi)=116var(Xi)=116. Therefore, E[ˉX]=14E[¯X]=14 and var(ˉX)=1/16100var(¯X)=1/16100.

Since n=100n=100, ˉX¯X will be approximately normally distributed by the CLT, hence
0.95=P(a<ˉX<b)=P(a1/41/1600<ˉX1/41/1600<b1/41/1600)P(a1/41/1600<N(0,1)<b1/41/1600).0.95=P(a<¯X<b)=P(a1/41/1600<¯X1/41/1600<b1/41/1600)P(a1/41/1600<N(0,1)<b1/41/1600).
There are infinitely many choices for aa and bb but a natural choice is P(ˉX<a)=P(ˉX>b)=0.025P(¯X<a)=P(¯X>b)=0.025. That is, we choose aa and bb such that there is equal chance that ˉX¯X is less than aa or greater than bb. Thus if for 0<q<10<q<1, zqzq satisfies P(Z<zq)=qP(Z<zq)=q, we have that
a1/41/1600=z0.025=1.96,b1/41/1600=z0.975=1.96.a1/41/1600=z0.025=1.96,b1/41/1600=z0.975=1.96.
Hence,
a=0.251.96140=0.201,b=0.25+1.96140=0.299.a=0.251.96140=0.201,b=0.25+1.96140=0.299.


7.3 Central limit theorem for discrete random variables

The central limit theorem can be applied to sums of discrete random variables as well as continuous random variables. Let X1,X2,X1,X2, be i.i.d. copies of a discrete random variable XX with E[X]=μE[X]=μ and var(X)=σ2var(X)=σ2. Further suppose that the support of XX is in the non-negative integers {0,1,}{0,1,}. (This covers all the discrete distributions, we have seen, binomial, negative binomial, Poisson and discrete uniform.)

Let YnN(nμ,nσ2)YnN(nμ,nσ2). Then the central limit theorem states that for large nn, SnYnSnYn. However, there will exist x{0,1,}x{0,1,} such that
P(Sn=x)>0butP(Yn=x)=0.P(Sn=x)>0butP(Yn=x)=0.
The solution is that we approximate
P(Sn=x)byP(x0.5<Ynx+0.5)=0.P(Sn=x)byP(x0.5<Ynx+0.5)=0.

This is known as the continuity correction.

Suppose that XX is a Bernoulli random variable with P(X=1)=0.6(=p)P(X=1)=0.6(=p), so E[X]=0.6E[X]=0.6 and var(X)=0.6×(10.6)=0.24var(X)=0.6×(10.6)=0.24. Then Sn=ni=1XiBin(n,0.6).Sn=ni=1XiBin(n,0.6).

For n=100n=100, S100Bin(100,0.6)S100Bin(100,0.6) can be approximated by YN(60,24)(=N(np,np(1p)))YN(60,24)(=N(np,np(1p))), see Figure 7.1.

Central limit theorem approximation for the binomial.

Figure 7.1: Central limit theorem approximation for the binomial.

We can see the approximation in Figure 7.1 in close-up for x=54x=54 to 5656 in Figure 7.2. The areas marked out by the red lines (normal approximation) are approximately equal to the areas of the bars in the histogram (binomial probabilities).

Central limit theorem approximation for the binomial for x=54 to 56.

Figure 7.2: Central limit theorem approximation for the binomial for x=54 to 56.

7.4 Law of Large Numbers

We observed that ˉXN(μ,σ2n),¯XN(μ,σ2n), and the variance is decreasing as nn increases.

Given that var(Sn)=var(ni=1Xi)=ni=1var(Xi)=nσ2,var(Sn)=var(ni=1Xi)=ni=1var(Xi)=nσ2, we have in general that var(ˉX)=var(Snn)=1n2var(Sn)=σ2n.var(¯X)=var(Snn)=1n2var(Sn)=σ2n.

A random variable YY which has E[Y]=μE[Y]=μ and var(Y)=0var(Y)=0 is the constant, YμYμ, that is, P(Y=μ)=1P(Y=μ)=1. This suggests that as nn, ˉX¯X converges in some sense to μμ. We can make this convergence rigorous.

Convergence in probability

A sequence of random variables Y1,Y2,Y1,Y2, are said to converge in probability to a random variable YY, if for any ϵ>0ϵ>0, P(|YnY|>ϵ)0 as n.P(|YnY|>ϵ)0 as n. We write YnpYYnp−−−Y as nn.

We will often be interested in convergence in probability where YY is a constant.

A useful result for proving convergence in probability to a constant μμ is Chebychev’s inequality. Chebychev’s inequality is a special case of the Markov inequality which is helpful in bounding probabilities in terms of expectations.

Chebychev’s inequality.
Let XX be a random variable with E[X]=μE[X]=μ and var(X)=σ2var(X)=σ2. Then for any ϵ>0ϵ>0, P(|Xμ|>ϵ)σ2ϵ2.P(|Xμ|>ϵ)σ2ϵ2.

Law of Large Numbers.
Let X1,X2,X1,X2, be i.i.d. according to a random variable XX with E[X]=μE[X]=μ and var(X)=σ2var(X)=σ2. Then ˉX=1nni=1Xipμ as n.¯X=1nni=1Xip−−−μ as n.

First, note that
E[ˉX]=E[1nni=1Xi]=1nni=1E[Xi]=1n(nμ)=μ.E[¯X]=E[1nni=1Xi]=1nni=1E[Xi]=1n(nμ)=μ.
For any ϵ>0ϵ>0, we have by Chebychev’s inequality that
P(|ˉXμ|>ϵ)1ϵ2var(ˉX)=σ2nϵ20 as n,P(|¯Xμ|>ϵ)1ϵ2var(¯X)=σ2nϵ20 as n,

and the Theorem follows.

Central limit theorem for dice

Dice picture.

Figure 7.3: Dice picture.

Let D1,D2,D1,D2, denote the outcomes of successive rolls of a fair six-sided dice.

Let Sn=ni=1DiSn=ni=1Di denote the total score from nn rolls of the dice and let Mn=1nSnMn=1nSn denote the mean score from nn rolls of the dice.

  1. What is the approximate distribution of S100S100?
  2. What is the approximate probability that S100S100 lies between 330330 and 380380, inclusive?
  3. How large does nn need to be such that P(|MnE[D]|>0.1)0.01P(|MnE[D]|>0.1)0.01?

Attempt Example 7.4.4 and then watch Video 15 for the solutions.

Video 15: Central limit theorem for dice

Solution to Example 7.4.4.
Note that D1D1 is a discrete uniform distribution with probability mass function
P(D1=x)={16x=1,2,,6,0otherwise.P(D1=x)={16x=1,2,,6,0otherwise.

Then E[D1]=72=3.5E[D1]=72=3.5 and Var(D1)=3512Var(D1)=3512.

  1. Since the rolls of the dice are independent,

    E[S100]=E[100i=1Di]=100i=1E[Di]=100E[D1]=350.E[S100]=E[100i=1Di]=100i=1E[Di]=100E[D1]=350.

    and

    var(S100)=var(100i=1Di)=100i=1var(Di)=100var(D1)=8753.var(S100)=var(100i=1Di)=100i=1var(Di)=100var(D1)=8753.

    Thus by the central limit theorem, S100YN(350,8753)S100YN(350,8753).

  2. Using the CLT approximation above, and the continuity correction

    P(330S100380)P(329.5Y380.5)=P(Y380.5)P(Y329.5)=0.96290.115=0.8479.P(330S100380)P(329.5Y380.5)=P(Y380.5)P(Y329.5)=0.96290.115=0.8479.

    If using Normal tables, we have that

    P(Y380.5)=P(Z=Y350875/3380.5350875/3)=Φ(1.786)P(Y380.5)=P(Z=Y350875/3380.5350875/3)=Φ(1.786)

    and

    P(Y329.5)=P(Z=Y350875/3329.5350875/3)=Φ(1.200).P(Y329.5)=P(Z=Y350875/3329.5350875/3)=Φ(1.200).

  3. Using the Central Limit Theorem, MnWnN(72,3512n)MnWnN(72,3512n).

We know by the law of large numbers that Mnp72Mnp−−−72 as nn, but how large does nn need to be such that there is a 99%99% (or greater) chance of MnMn being within 0.10.1 of 3.53.5?

Using the approximation WnWn, we want:
P(|Wn72|>0.1)0.01.P(Wn72>0.1)0.01.
Now equivalently we want nn such that
0.99P(3.4Wn3.6)=P(3.43.535/(12n)Z3.63.535/(12n))=P(0.058554n<Z<0.58554n)=P(|Z|<0.058554n)=1P(|Z|>0.058554n).0.99P(3.4Wn3.6)=P(3.43.535/(12n)Z3.63.535/(12n))=P(0.058554n<Z<0.58554n)=P(|Z|<0.058554n)=1P(|Z|>0.058554n).
Consider P(|Z|>0.058554n)=0.01P(|Z|>0.058554n)=0.01.
Note that
P(|Z|>c)=αP(Z>c)=α2P(Zc)=1α2.P(|Z|>c)=αP(Z>c)=α2P(Zc)=1α2.
We have α=0.01α=0.01, and using qnorm function in R qnorm(0.995) gives c=c= 2.5758293.
Therefore
P(|Z|>0.058554n)=0.01=P(|Z|>2.5758),P(|Z|>0.058554n)=0.01=P(|Z|>2.5758),
or equivalently
0.058554n=2.5758n=43.99n=1935.2.0.058554n=2.5758n=43.99n=1935.2.

Given that we require n1935.2n1935.2, we have that n=1936n=1936.


Task: Session 4

Attempt the R Markdown file for Session 4:
Session 4: Convergence and the Central Limit Theorem

Student Exercises

Attempt the exercises below.


Let X1,X2,,X25X1,X2,,X25 be independent Poisson random variables each having mean 1. Use the central limit theorem to approximate
P(25i=1Xi>20).P(25i=1Xi>20).
Solution to Exercise 7.1.
Let X=25i=1XiX=25i=1Xi. Then, E[X]=25E[X]=25 and var(X)=25var(X)=25. By the central limit theorem, XYN(25,52)XYN(25,52)
P(X>20)P(Y>20.5)=0.8159.P(X>20)P(Y>20.5)=0.8159.
Converting to a standard normal distribution, ZN(0,1)ZN(0,1).
P(Y>20.5)=P(Z=Y255>20.5255)=P(Z>0.9)=P(Z<0.9)=0.8159.P(Y>20.5)=P(Z=Y255>20.5255)=P(Z>0.9)=P(Z<0.9)=0.8159.

Note that the final step comes from the symmetry of the normal distribution.

For comparison, since XPo(25)XPo(25), we have that P(X>20)=0.8145P(X>20)=0.8145.



The lifetime of a Brand X TV (in years) is an exponential random variable with mean 10. By using the central limit theorem, find the approximate probability that the average lifetime of a random sample of 36 TVs is at least 10.5.

Solution to Exercise 7.2.

Let XiXi denote the lifetime of the ithith TV in the sample, i=1,2,36i=1,2,36. Then (from the lecture notes) we know that E[Xi]=10E[Xi]=10, var(Xi)=100var(Xi)=100.

The sample mean is ˉX=(X1+X36)/36¯X=(X1+X36)/36.

Using the central limit theorem, X1++X36N(360,3600)X1++X36N(360,3600), so
ˉXN(36036,3600362)=N(10,2.778).¯XN(36036,3600362)=N(10,2.778).
Thus
P(ˉX>10.5)=P(ˉX102.778>10.5102.778)P(Z>0.3)P(¯X>10.5)=P(¯X102.778>10.5102.778)P(Z>0.3)

where ZN(0,1)ZN(0,1). Thus the required probability is approximately P(Z>0.3)=1Φ(0.3)=0.3821P(Z>0.3)=1Φ(0.3)=0.3821.



Prior to October 2015, in the UK National Lottery gamblers bought a ticket on which they mark six different numbers from {1,2,,49}. Six balls were drawn uniformly at random without replacement from a set of 49 similarly numbered balls. A ticket won the jackpot if the six numbers marked are the same as the six numbers drawn.

  1. Show that the probability a given ticket won the jackpot is 1/13983816.
  2. In Week 9 of the UK National Lottery 69,846,979 tickets were sold and there were 133 jackpot winners. If all gamblers chose their numbers independently and uniformly at random, use the central limit theorem to determine the approximate distribution of the number of jackpot winners that week. Comment on this in the light of the actual number of jackpot winners.
Solution to Exercise 7.3.
  1. There are (496)=13,983,816 different ways of choosing 6 distinct numbers 1,2,,49, so the probability a ticket wins the jackpot if 1/13983816.
  2. Let X be the number of jackpot winners in Week 9 if gamblers chose their numbers independently and uniformly at random. Then
    XBin(69,846,979,113,989,816)=Bin(n,p), say.
    Then to 4 decimal places E[X]=np=4.9927 and var(X)=np(1p)=4.9927.
    Hence, by the central limit theorem,
    XN(4.9927,4.9927).
Given the Central Limit Theorem the probability of at least 133 jackpot winners is, using the continuity correction,
P(X133)P(X4.99274.9927132.54.99274.9927)=P(Z57.065),

where ZN(0,1). This probability is small, so there is very strong evidence that the gamblers did not choose their numbers independently and uniformly at random.