Chapter 7 Central Limit Theorem and law of large numbers
7.1 Introduction
In this Section we will show why the Normal distribution, introduced in Section 5.7, is so important in probability and statistics. The central limit theorem states that under very weak conditions (almost all probability distributions you will see will satisfy them) the sum of nn i.i.d. random variables, SnSn, will converge, appropriately normalised to a standard Normal distribution as n→∞n→∞. For finite, but large n(≥50)n(≥50), we can approximate SnSn by a normal distribution and the normal distribution approximation can be used to answer questions concerning SnSn. In Section 7.2 we present the Central Limit Theorem and apply it to an example using exponential random variables. In Section 7.3 we explore how a continuous distribution (the Normal distribution) can be used to approximate sums of discrete distributions. Finally, in Section 7.4, we present the Law of Large Numbers which states that the uncertainty in the sample mean of nn observations Sn/nSn/n decreases as nn increases and converges to the population mean μμ. Both the Central Limit Theorem and the Law of Large Numbers will be important moving forward when considering statistical questions.
7.2 Statement of Central Limit Theorem
Before stating the Central Limit Theorem, we introduce some notation.
We write YnD→Y as n→∞.
Let X1,X2,…,Xn be independent and identically distributed random variables (i.e. a random sample) with finite mean μ and variance σ2. Let Sn=X1+⋯+Xn. Then
where ˉX=1n∑ni=1Xi is the mean of the distributions X1,X2,…,Xn.
Therefore, we have that for large n,Suppose X1,X2,…,X100 are i.i.d. exponential random variables with parameter λ=4.
- Find P(S100>30).
- Find limits within which ˉX will lie with probability 0.95.
Since X1,X2,…,X100 are i.i.d. exponential random variables with parameter λ=4, E[Xi]=14 and var(Xi)=116. Hence,
E[S100]=100⋅14=25;var(S100)=100⋅116=254.
Since n=100 is sufficiently big, S100 is approximately normally distributed by the central limit theorem (CLT). Therefore,
P(S100>30)=P(S100−25√254>30−25√254)≈P(N(0,1)>2)=1−P(N(0,1)≤2)=0.0228.
Given that S100=∑100i=1Xi∼Gamma(100,4), we can compute the exactly P(S100>30)=0.0279, which shows that the central limit theorem gives a reasonable approximation.
Since X1,X2,…,X100 are i.i.d. exponential random variables with parameter λ=4, E[Xi]=14 and var(Xi)=116. Therefore, E[ˉX]=14 and var(ˉX)=1/16100.
Since n=100, ˉX will be approximately normally distributed by the CLT, hence0.95=P(a<ˉX<b)=P(a−1/4√1/1600<ˉX−1/4√1/1600<b−1/4√1/1600)≈P(a−1/4√1/1600<N(0,1)<b−1/4√1/1600).
There are infinitely many choices for a and b but a natural choice is P(ˉX<a)=P(ˉX>b)=0.025. That is, we choose a and b such that there is equal chance that ˉX is less than a or greater than b. Thus if for 0<q<1, zq satisfies P(Z<zq)=q, we have that
a−1/4√1/1600=z0.025=−1.96,b−1/4√1/1600=z0.975=1.96.
Hence,
a=0.25−1.96140=0.201,b=0.25+1.96140=0.299.
7.3 Central limit theorem for discrete random variables
The central limit theorem can be applied to sums of discrete random variables as well as continuous random variables. Let X1,X2,… be i.i.d. copies of a discrete random variable X with E[X]=μ and var(X)=σ2. Further suppose that the support of X is in the non-negative integers {0,1,…}. (This covers all the discrete distributions, we have seen, binomial, negative binomial, Poisson and discrete uniform.)
Let Yn∼N(nμ,nσ2). Then the central limit theorem states that for large n, Sn≈Yn. However, there will exist x∈{0,1,…} such thatThis is known as the continuity correction.
Suppose that X is a Bernoulli random variable with P(X=1)=0.6(=p), so E[X]=0.6 and var(X)=0.6×(1−0.6)=0.24. Then Sn=n∑i=1Xi∼Bin(n,0.6).
For n=100, S100∼Bin(100,0.6) can be approximated by Y∼N(60,24)(=N(np,np(1−p))), see Figure 7.1.

Figure 7.1: Central limit theorem approximation for the binomial.
We can see the approximation in Figure 7.1 in close-up for x=54 to 56 in Figure 7.2.

Figure 7.2: Central limit theorem approximation for the binomial for x=54 to 56.
7.4 Law of Large Numbers
We observed that ˉX≈N(μ,σ2n), and the variance is decreasing as n increases.
Given that var(Sn)=var(n∑i=1Xi)=n∑i=1var(Xi)=nσ2, we have in general that var(ˉX)=var(Snn)=1n2var(Sn)=σ2n.
A random variable Y which has E[Y]=μ and var(Y)=0 is the constant, Y≡μ, that is, P(Y=μ)=1. This suggests that as n→∞, ˉX converges in some sense to μ. We can make this convergence rigorous.
A sequence of random variables Y1,Y2,… are said to converge in probability to a random variable Y, if for any ϵ>0, P(|Yn−Y|>ϵ)→0 as n→∞. We write Ynp→Y as n→∞.
We will often be interested in convergence in probability where Y is a constant.
A useful result for proving convergence in probability to a constant μ is Chebychev’s inequality. Chebychev’s inequality is a special case of the Markov inequality which is helpful in bounding probabilities in terms of expectations.
Chebychev’s inequality.
Let X be a random variable with E[X]=μ and var(X)=σ2. Then for any ϵ>0,
P(|X−μ|>ϵ)≤σ2ϵ2.
Law of Large Numbers.
Let X1,X2,… be i.i.d. according to a random variable X with E[X]=μ and var(X)=σ2. Then
ˉX=1nn∑i=1Xip→μ as n→∞.
and the Theorem follows.

Figure 7.3: Dice picture.
Central limit theorem for dice
Let D1,D2,… denote the outcomes of successive rolls of a fair six-sided dice.
Let Sn=∑ni=1Di denote the total score from n rolls of the dice and let Mn=1nSn denote the mean score from n rolls of the dice.
- What is the approximate distribution of S100?
- What is the approximate probability that S100 lies between 330 and 380, inclusive?
- How large does n need to be such that P(|Mn−E[D]|>0.1)≤0.01?
Attempt Exercise 1 and then watch Video 15 for the solutions.
Video 15: Central limit theorem for dice
Alternatively the solutions are available:
Solution to Exercise 1
Then E[D1]=72=3.5 and Var(D1)=3512.
Since the rolls of the dice are independent,
E[S100]=E[100∑i=1Di]=100∑i=1E[Di]=100E[D1]=350.
and
var(S100)=var(100∑i=1Di)=100∑i=1var(Di)=100var(D1)=8753.
Thus by the central limit theorem, S100≈Y∼N(350,8753).
Using the CLT approximation above, and the continuity correction
P(330≤S100≤380)≈P(329.5≤Y≤380.5)=P(Y≤380.5)−P(Y≤329.5)=0.9629−0.115=0.8479.
If using Normal tables, we have that
P(Y≤380.5)=P(Z=Y−350√875/3≤380.5−350√875/3)=Φ(1.786)
and
P(Y≤329.5)=P(Z=Y−350√875/3≤329.5−350√875/3)=Φ(−1.200).
Using the Central Limit Theorem, Mn≈Wn∼N(72,3512n).
We know by the law of large numbers that Mnp→72 as n→∞, but how large does n need to be such that there is a 99% (or greater) chance of Mn being within 0.1 of 3.5?
Using the approximation Wn, we want:
P(|Wn−72|>0.1)≤0.01.
Now
P(|Wn−72|>0.1)=P(3.4≤Wn≤3.6)=P(3.4−3.5√35/(12n)≤Z≤3.6−3.5√35/(12n))=P(−0.058554√n<Z<0.58554√n)=P(|Z|>0.058554√n).
Consider P(|Z|>0.058554√n)=0.01.
Note that
P(|Z|>c)=α⇔P(Z>c)=α2⇔P(Z≤c)=1−α2.
We have α=0.01, and using
qnorm
function in Rqnorm(0.995)
gives c= 2.5758293.
ThereforeP(|Z|>0.058554√n)=0.01=P(|Z|>2.5758), or equivalently 0.058554√n=2.5758⇒√n=43.99⇒n=1935.2.
Given that we require n≥1935.2, we have that n=1936.
Task: Lab 4
Attempt the R Markdown file for Lab 4:
Lab 4: Convergence and the Central Limit Theorem
Student Exercises
Attempt the exercises below.
Question 1.
Let X1,X2,…,X25 be independent Poisson random variables each having mean 1. Use the central limit theorem to approximateSolution to Question 1.
Note that the final step comes from the symmetry of the normal distribution.
For comparison, since X∼Po(25), we have that P(X>20)=‘r1−round(ppois(20,25),4)‘.
Question 2.
The lifetime of a Brand X TV (in years) is an exponential random variable with mean 10. By using the central limit theorem, find the approximate probability that the average lifetime of a random sample of 36 TVs is at least 10.5.
Solution to Question 2.
Let Xi denote the lifetime of the ith TV in the sample, i=1,2…,36. Then (from the lecture notes) we know that E[Xi]=10, var(Xi)=100.
The sample mean is ˉX=(X1+…X36)/36.
Using the central limit theorem, X1+…+X36≈N(360,3600), sowhere Z∼N(0,1). Thus the required probability is approximately P(Z>0.3)=1−Φ(0.3)=‘rround(pnorm(−0.3),4)‘.
Question 3.
Prior to October 2015, in the UK National Lottery gamblers bought a ticket on which they mark six different numbers from {1,2,…,49}. Six balls were drawn uniformly at random without replacement from a set of 49 similarly numbered balls. A ticket won the jackpot if the six numbers marked are the same as the six numbers drawn.
- Show that the probability a given ticket won the jackpot is 1/13983816.
- In Week 9 of the UK National Lottery 69,846,979 tickets were sold and there were 133 jackpot winners. If all gamblers chose their numbers independently and uniformly at random, use the central limit theorem to determine the approximate distribution of the number of jackpot winners that week. Comment on this in the light of the actual number of jackpot winners.
Solution to Question 3.
- There are (496)=13,983,816 different ways of choosing 6 distinct numbers 1,2,…,49, so the probability a ticket wins the jackpot if 1/13983816.
- Let X be the number of jackpot winners in Week 9 if gamblers chose their numbers independently and uniformly at random. Then
X∼Bin(69,846,979,113,989,816)=Bin(n,p), say. Then to 4 decimal places E[X]=np=4.9927 and var(X)=np(1−p)=4.9927.
Hence, by the central limit theorem,X≈N(4.9927,4.9927).
where Z∼N(0,1). This probability is small, so there is very strong evidence that the gamblers did not choose their numbers independently and uniformly at random.