Chapter 7 Central Limit Theorem and law of large numbers
7.1 Introduction
In this Section we will show why the Normal distribution, introduced in Section 5.7, is so important in probability and statistics. The central limit theorem states that under very weak conditions (almost all probability distributions you will see will satisfy them) the sum of nn i.i.d. random variables, SnSn, will converge, appropriately normalised to a standard Normal distribution as n→∞n→∞. For finite, but large n(≥50)n(≥50), we can approximate SnSn by a normal distribution and the normal distribution approximation can be used to answer questions concerning SnSn. In Section 7.2 we present the Central Limit Theorem and apply it to an example using exponential random variables. In Section 7.3 we explore how a continuous distribution (the Normal distribution) can be used to approximate sums of discrete distributions. Finally, in Section 7.4, we present the Law of Large Numbers which states that the uncertainty in the sample mean of nn observations, Sn/nSn/n, decreases as nn increases and converges to the population mean μμ. Both the Central Limit Theorem and the Law of Large Numbers will be important moving forward when considering statistical questions.
7.2 Statement of Central Limit Theorem
Before stating the Central Limit Theorem, we introduce some notation.
Convergence in distribution
A sequence of random variables Y1,Y2,…Y1,Y2,… are said to converge in distribution to a random variable YY, if for all y∈Ry∈R,We write YnD→YYnD−−−−−→Y as n→∞n→∞.
Let X1,X2,…,XnX1,X2,…,Xn be independent and identically distributed random variables (i.e. a random sample) with finite mean μμ and variance σ2σ2. Let Sn=X1+⋯+XnSn=X1+⋯+Xn. Then
where ˉX=1n∑ni=1Xi¯X=1n∑ni=1Xi is the mean of the distributions X1,X2,…,XnX1,X2,…,Xn.
Therefore, we have that for large nn,Suppose X1,X2,…,X100X1,X2,…,X100 are i.i.d. exponential random variables with parameter λ=4λ=4.
- Find P(S100>30)P(S100>30).
- Find limits within which ˉX¯X will lie with probability 0.950.95.
Since X1,X2,…,X100X1,X2,…,X100 are i.i.d. exponential random variables with parameter λ=4λ=4, E[Xi]=14E[Xi]=14 and var(Xi)=116var(Xi)=116. Hence,
E[S100]=100⋅14=25;var(S100)=100⋅116=254.E[S100]=100⋅14=25;var(S100)=100⋅116=254.
Since n=100n=100 is sufficiently big, S100S100 is approximately normally distributed by the central limit theorem (CLT). Therefore,
P(S100>30)=P(S100−25√254>30−25√254)≈P(N(0,1)>2)=1−P(N(0,1)≤2)=0.0228.P(S100>30)=P⎛⎜ ⎜⎝S100−25√254>30−25√254⎞⎟ ⎟⎠≈P(N(0,1)>2)=1−P(N(0,1)≤2)=0.0228.
Given that S100=∑100i=1Xi∼Gamma(100,4)S100=∑100i=1Xi∼Gamma(100,4), see Section 5.6.2, we can compute exactly P(S100>30)=0.0279P(S100>30)=0.0279, which shows that the central limit theorem gives a reasonable approximation.
Since X1,X2,…,X100X1,X2,…,X100 are i.i.d. exponential random variables with parameter λ=4λ=4, E[Xi]=14E[Xi]=14 and var(Xi)=116var(Xi)=116. Therefore, E[ˉX]=14E[¯X]=14 and var(ˉX)=1/16100var(¯X)=1/16100.
7.3 Central limit theorem for discrete random variables
The central limit theorem can be applied to sums of discrete random variables as well as continuous random variables. Let X1,X2,…X1,X2,… be i.i.d. copies of a discrete random variable XX with E[X]=μE[X]=μ and var(X)=σ2var(X)=σ2. Further suppose that the support of XX is in the non-negative integers {0,1,…}{0,1,…}. (This covers all the discrete distributions, we have seen, binomial, negative binomial, Poisson and discrete uniform.)
Let Yn∼N(nμ,nσ2)Yn∼N(nμ,nσ2). Then the central limit theorem states that for large nn, Sn≈YnSn≈Yn. However, there will exist x∈{0,1,…}x∈{0,1,…} such thatThis is known as the continuity correction.
Suppose that XX is a Bernoulli random variable with P(X=1)=0.6(=p)P(X=1)=0.6(=p), so E[X]=0.6E[X]=0.6 and var(X)=0.6×(1−0.6)=0.24var(X)=0.6×(1−0.6)=0.24. Then Sn=n∑i=1Xi∼Bin(n,0.6).Sn=n∑i=1Xi∼Bin(n,0.6).
For n=100n=100, S100∼Bin(100,0.6)S100∼Bin(100,0.6) can be approximated by Y∼N(60,24)(=N(np,np(1−p)))Y∼N(60,24)(=N(np,np(1−p))), see Figure 7.1.

Figure 7.1: Central limit theorem approximation for the binomial.
We can see the approximation in Figure 7.1 in close-up for x=54x=54 to 5656 in Figure 7.2. The areas marked out by the red lines (normal approximation) are approximately equal to the areas of the bars in the histogram (binomial probabilities).

Figure 7.2: Central limit theorem approximation for the binomial for x=54 to 56.
7.4 Law of Large Numbers
We observed that ˉX≈N(μ,σ2n),¯X≈N(μ,σ2n), and the variance is decreasing as nn increases.
Given that var(Sn)=var(n∑i=1Xi)=n∑i=1var(Xi)=nσ2,var(Sn)=var(n∑i=1Xi)=n∑i=1var(Xi)=nσ2, we have in general that var(ˉX)=var(Snn)=1n2var(Sn)=σ2n.var(¯X)=var(Snn)=1n2var(Sn)=σ2n.
A random variable YY which has E[Y]=μE[Y]=μ and var(Y)=0var(Y)=0 is the constant, Y≡μY≡μ, that is, P(Y=μ)=1P(Y=μ)=1. This suggests that as n→∞n→∞, ˉX¯X converges in some sense to μμ. We can make this convergence rigorous.
Convergence in probability
A sequence of random variables Y1,Y2,…Y1,Y2,… are said to converge in probability to a random variable YY, if for any ϵ>0ϵ>0, P(|Yn−Y|>ϵ)→0 as n→∞.P(|Yn−Y|>ϵ)→0 as n→∞. We write Ynp→YYnp−−−−→Y as n→∞n→∞.
We will often be interested in convergence in probability where YY is a constant.
A useful result for proving convergence in probability to a constant μμ is Chebychev’s inequality. Chebychev’s inequality is a special case of the Markov inequality which is helpful in bounding probabilities in terms of expectations.
Chebychev’s inequality.
Let XX be a random variable with E[X]=μE[X]=μ and var(X)=σ2var(X)=σ2. Then for any ϵ>0ϵ>0,
P(|X−μ|>ϵ)≤σ2ϵ2.P(|X−μ|>ϵ)≤σ2ϵ2.
Law of Large Numbers.
Let X1,X2,…X1,X2,… be i.i.d. according to a random variable XX with E[X]=μE[X]=μ and var(X)=σ2var(X)=σ2. Then
ˉX=1nn∑i=1Xip→μ as n→∞.¯X=1nn∑i=1Xip−−−−→μ as n→∞.
and the Theorem follows.
Central limit theorem for dice

Figure 7.3: Dice picture.
Let D1,D2,…D1,D2,… denote the outcomes of successive rolls of a fair six-sided dice.
Let Sn=∑ni=1DiSn=∑ni=1Di denote the total score from nn rolls of the dice and let Mn=1nSnMn=1nSn denote the mean score from nn rolls of the dice.
- What is the approximate distribution of S100S100?
- What is the approximate probability that S100S100 lies between 330330 and 380380, inclusive?
- How large does nn need to be such that P(|Mn−E[D]|>0.1)≤0.01P(|Mn−E[D]|>0.1)≤0.01?
Attempt Example 7.4.4 and then watch Video 15 for the solutions.
Video 15: Central limit theorem for dice
Solution to Example 7.4.4.
Then E[D1]=72=3.5E[D1]=72=3.5 and Var(D1)=3512Var(D1)=3512.
Since the rolls of the dice are independent,
E[S100]=E[100∑i=1Di]=100∑i=1E[Di]=100E[D1]=350.E[S100]=E[100∑i=1Di]=100∑i=1E[Di]=100E[D1]=350.
and
var(S100)=var(100∑i=1Di)=100∑i=1var(Di)=100var(D1)=8753.var(S100)=var(100∑i=1Di)=100∑i=1var(Di)=100var(D1)=8753.
Thus by the central limit theorem, S100≈Y∼N(350,8753)S100≈Y∼N(350,8753).
Using the CLT approximation above, and the continuity correction
P(330≤S100≤380)≈P(329.5≤Y≤380.5)=P(Y≤380.5)−P(Y≤329.5)=0.9629−0.115=0.8479.P(330≤S100≤380)≈P(329.5≤Y≤380.5)=P(Y≤380.5)−P(Y≤329.5)=0.9629−0.115=0.8479.
If using Normal tables, we have that
P(Y≤380.5)=P(Z=Y−350√875/3≤380.5−350√875/3)=Φ(1.786)P(Y≤380.5)=P(Z=Y−350√875/3≤380.5−350√875/3)=Φ(1.786)
and
P(Y≤329.5)=P(Z=Y−350√875/3≤329.5−350√875/3)=Φ(−1.200).P(Y≤329.5)=P(Z=Y−350√875/3≤329.5−350√875/3)=Φ(−1.200).
Using the Central Limit Theorem, Mn≈Wn∼N(72,3512n)Mn≈Wn∼N(72,3512n).
We know by the law of large numbers that Mnp→72Mnp−−−−→72 as n→∞n→∞, but how large does nn need to be such that there is a 99%99% (or greater) chance of MnMn being within 0.10.1 of 3.53.5?
Using the approximation WnWn, we want:Note that
qnorm
function in R qnorm(0.995)
gives c=c= 2.5758293.Therefore
Given that we require n≥1935.2n≥1935.2, we have that n=1936n=1936.
Task: Session 4
Attempt the R Markdown file for Session 4:
Session 4: Convergence and the Central Limit Theorem
Student Exercises
Attempt the exercises below.
Let X1,X2,…,X25X1,X2,…,X25 be independent Poisson random variables each having mean 1. Use the central limit theorem to approximate
Solution to Exercise 7.1.
Note that the final step comes from the symmetry of the normal distribution.
For comparison, since X∼Po(25)X∼Po(25), we have that P(X>20)=0.8145P(X>20)=0.8145.
The lifetime of a Brand X TV (in years) is an exponential random variable with
mean 10. By using the central limit theorem, find the approximate probability
that the average lifetime of a random sample of 36 TVs is at least 10.5.
Solution to Exercise 7.2.
Let XiXi denote the lifetime of the ithith TV in the sample, i=1,2…,36i=1,2…,36. Then (from the lecture notes) we know that E[Xi]=10E[Xi]=10, var(Xi)=100var(Xi)=100.
The sample mean is ˉX=(X1+…X36)/36¯X=(X1+…X36)/36.
Using the central limit theorem, X1+…+X36≈N(360,3600)X1+…+X36≈N(360,3600), sowhere Z∼N(0,1)Z∼N(0,1). Thus the required probability is approximately P(Z>0.3)=1−Φ(0.3)=0.3821P(Z>0.3)=1−Φ(0.3)=0.3821.
Prior to October 2015,
in the UK National Lottery gamblers bought a ticket on which they mark six different numbers from
{1,2,…,49}. Six balls were drawn uniformly at random without replacement from a set of 49 similarly numbered balls. A ticket won the jackpot if the six numbers marked are the same as the six numbers drawn.
- Show that the probability a given ticket won the jackpot is 1/13983816.
- In Week 9 of the UK National Lottery 69,846,979 tickets were sold and there were 133 jackpot winners. If all gamblers chose their numbers independently and uniformly at random, use the central limit theorem to determine the approximate distribution of the number of jackpot winners that week. Comment on this in the light of the actual number of jackpot winners.
Solution to Exercise 7.3.
- There are (496)=13,983,816 different ways of choosing 6 distinct numbers 1,2,…,49, so the probability a ticket wins the jackpot if 1/13983816.
- Let X be the number of jackpot winners in Week 9 if gamblers chose their numbers independently and uniformly at random. Then
X∼Bin(69,846,979,113,989,816)=Bin(n,p), say. Then to 4 decimal places E[X]=np=4.9927 and var(X)=np(1−p)=4.9927.
Hence, by the central limit theorem,X≈N(4.9927,4.9927).
where Z∼N(0,1). This probability is small, so there is very strong evidence that the gamblers did not choose their numbers independently and uniformly at random.