Chapter 5 Random Variables

5.1 Overview

In this Chapter we will introduce the concept of a random variable (Section 5.2). Random variables assign numerical values to outcomes from a sample space and these can be discrete (counts), continuous (measurements on the real-line) or mixed. Key summaries for random variables are their expectation (mean) and variance, concepts that we have already seen for summarising data and which in Section 5.3 we formalise for random variables. We introduce important classes of random variables (probability distributions), both discrete and continuous distributions. These include:

5.2 Random variables

Random variable.

A random variable (r.v.) X is a mapping from Ω to R, that is

X:ΩR.

For example,

  • Let X be the number of heads observed when tossing a fair coin three times.

  • Let T be the length of time you wait to be serviced by a bank teller.

Note: Random variables can be either discrete (i.e. take a finite or countable number of values), continuous, or mixed.

An example of a mixed random variable is, R, the amount of rain (ml) on a given day.

Cumulative distribution function.

The cumulative distribution function (c.d.f.) of a random variable X is

FX(x)=P(Xx)=P({ωΩ:X(ω)x}).

Properties of the c.d.f include

  • P(X>x)=1FX(x).

  • P(x1<Xx2)=FX(x2)FX(x1).

Note the c.d.f. is defined for all random variables regardless of whether they are discrete, continuous or mixed.

Probability mass function.

If X is a discrete random variable, then we can define a function pX(x), called the probability mass function (p.m.f.) such that
pX(xi)=P(X=xi)=P({ω:X(ω)=xi}).

Coin toss.

Let X be the number of heads observed when tossing a fair coin three times. What is the p.m.f. of X?

pX(x)={1/8,if x=0,3/8,if x=1,3/8,if x=2,1/8,if x=3,0,otherwise.

Probability density function.

Let X be a continuous random variable. If there exists some non-negative function fX on R such that for any interval I,
P(XI)=IfX(u)du,

the function fX is called the probability density function (p.d.f.) of X.

Note that if FX(x) is the c.d.f. of a continuous random variable X, then the p.d.f. of X is given by fX(x)=dFX(x)dx.

Note that
FX(x)=P(Xx)={xixpX(xi),if X is discrete,xfX(u)du,if X is continuous.

5.3 Expectation

In this Section we formally define the expectation (mean), variance, median and mode of a random variable. We can note the similarities with the definitions of the measures of location (mean, median and mode) and variance of summary statistics in Section 2.

Expectation.

The expectation of a random variable X is defined by
E[X]={xixipX(xi),if X is discrete,xfX(x)dx,if X is continuous.

Note that E[X] only exists if E[|X|]< and that E[X] is a measure of the centre of the distribution, that is the centre of mass. We can also define expectations of functions of random variables.


If Y=g(X) then the expectation of Y is given by
E[Y]=E[g(X)]={xig(xi)pX(xi),if X is discrete,g(x)fX(x)dx,if X is continuous.

For constants c, ci and d, the following are properties of the expectation:

  • E[c]=c;
  • E[cg(X)+d]=cE[g(X)]+d;
  • E[ni=1cigi(Xi)]=ni=1ciE[gi(Xi)];
  • A special case of the above results is c1==cn=1 and gi() is the identity transform, gi(Xi)=Xi. Then E[ni=1Xi]=ni=1E[Xi].

Variance.

The variance of X is

Var(X)=E[(XE[X])2].

The standard deviation of X is Var(X).

For constants c, ci and d, the following are properties of the variance:

  • Var(X)=E[X2](E[X])2;

  • Var(X)0;

  • Var(cX+d)=c2Var(X);

  • If X1,,Xn are independent, then

    Var(ni=1ciXi)=ni=1c2iVar(Xi).

Median.

The median of X is defined as x0 such that FX(x0)=0.5.

For a discrete random variable it is unlikely that there exists x0 such that FX(x0)=0.5. Therefore for discrete random variables the median is defined to be the smallest x0 such that FX(x0)0.5.

Mode.

The mode of X is the point at which fX(x) is maximised, i.e. mode is x0 if and only if fX(x0)fX(x) for all x.

Continuous distribution.

Suppose that the random variable X has probability density function:
fX(x)={kx31x2,0otherwise.
Plot of $f_X (x)$.

Figure 5.1: Plot of fX(x).

  1. Show that k=4/15;
  2. Find P(54X74).
  3. Compute the standard deviation of X.
    Remember: Standard deviation is the square root of the variance.
  4. Find the median of X.

Attempt Example 5.3.6 and then watch Video 10 for the solutions.

Video 10: Continuous random variable

Solution to Example 5.3.6.
  1. Remember fX(x)dx=1 and therefore
    1=fX(x)dx=21kx3dx=k[x44]21=k(244144)=k×154.
    Thus, k=415.
  2. It follows from the above that the c.d.f of X is
    FX(x)={0for x<1x1415y3dy=x4115for 1x21for x>2
    Thus
    P(54X74)=(7/4)4115(5/4)4115=115[(74)4(54)4]=3780(=0.4625).
  3. Remember that the standard deviation of X is the square root of the variance. Therefore
    sd(X)=var(X)=E[X2]E[X]2.
For any n=1,2,,
E[Xn]=xnfX(x)dx=21xn415x3dx=415[xn+4n+4]21=4(2n+41)15(n+4).
Therefore
E[X]=4(321)15×5=12475=1.6533E[X2]=4(641)15×6=145=2.8
Thus
sd(X)=2.81.65332=0.2579.
  1. The median of X, m, satisfies
    0.5=P(Xm)=m41157.5=m418.5=m4.
Therefore
m=(8.5)1/4=1.7075.


5.4 Bernoulli distribution and its extension

In this section, we start with the Bernoulli random variable, which is the simplest non-trivial probability distribution taking two possible values (0 or 1). In itself the Bernoulli random variable might not seem particularly exciting, but it forms a key building block in probability and statistics. We consider probability distributions which arise as extensions of the Bernoulli random variable such as the Binomial distribution (sum of n Bernoulli random variables), the Geometric distribution (number of Bernoulli random variables until we get a 1) and the Negative Binomial distribution (number of Bernoulli random variables until we get our nth 1).

5.4.1 Bernoulli distribution

Bernoulli trial.

A Bernoulli trial is a simple random experiment with two outcomes: success (1) or failure (0). The success probability is p, so failure probability = 1p(=q). A Bernoulli random variable X describes this:
X={1success - probability p,0failure - probability q=1p.
The Bernoulli distribution has probability mass function:
pX(x)=px(1p)1xx=0,1.

[If x=1, pX(1)=p1q0=p and if x=0, pX(0)=p0q1=q.]

We have that
E[X]=[p×1]+[q×0]=p
and
E[X2]=[p×12]+[q×02]=p.
Therefore
var(X)=E[X2]E[X]2=pp2=p(1p)=pq.

5.4.2 Binomial Distribution

Independent and identically distributed
Two discrete random variables X and Y are said to be independent and identically distributed (i.i.d.) if for all x,yR,
P(X=x,Y=y)=P(X=x)×P(Y=y)(independence)
and for all xR,
P(Y=x)=P(X=x),

(identically distributed, i.e. have the same pmf.)


Binomial distribution.

Consider n independent Bernoulli trials, each with success probability p. Let X be the total number of successes. Then X has a Binomial distribution, written
XBin(n,p),or XB(n,p)
and for k=0,1,,n,
pX(k)=P(X=k)=(nk)pk(1p)nk.

To see this: consider any particular sequence of k successes and nk failures. Each such sequence has probability pk(1p)nk, since the n trials are independent. There are (nk) ways of choosing the positions of the k successes out of n trials.

Note: 1. n and p are called the parameters of the Binomial distribution.
2. The number of trials n is fixed.
3. There are only two possible outcomes: ‘success’ with probability p and ‘failure’ with probability q=1p.
4. The probability of success p in each independent trial is constant.

Binomial distribution: Expectation and variance.

Let XBin(n,p), then
E[X]=npandvar(X)=np(1p).
We can write
X=X1+X2++Xn,

where X1,X2,,Xn are independent Bernoulli random variables each with success probability p.

Therefore using properties of expectations
E[X]=E[X1+X2++Xn]=E[X1]+E[X2]++E[Xn]=p+p++p=np.
Given that the Xi’s in (5.1) are independent, we also have that
var(X)=var(X1+X2++Xn)=var(X1)+var(X2)++var(Xn)=p(1p)+p(1p)++p(1p)=np(1p).


The cumulative distribution function is
FX(x)=P(Xx)=[x]k=0(nk)pk(1p)nk,

where [x] is the greatest integer not greater than x.

An R Shiny app is provided to explore the Binomial distribution.

R Shiny app: Binomial distribution


Twenty multiple choice questions, each with 5 options. Suppose that you guess at random, independently for each question. Then if X is the number of right answers,
XBin(20,0.2).
Then
P(X=3)=(203)(0.2)3(0.8)17=0.2053,
and
P(X3)=3k=0(20k)(0.2)k(0.8)20k=0.4114.

5.4.3 Geometric Distribution

Geometric distribution.

Consider a sequence of independent Bernoulli trials each with success probability p. Let Y denote the number of trials needed for the first success to appear. Then Y has a Geometric distribution with parameter p, written YGeom(p), and
pY(k)=(1p)k1p,k=1,2,3,.

To see this: If the kth trial is the first success then the first k1 trials must have been failures. Probability of this is (1p)k1p.

Note that
k=1pY(k)=k=1(1p)k1p=pi=0(1p)i=p1(1p)=1,

so a success eventually occurs with probability 1.

Geometric distribution: Expectation and variance.

Let YGeom(p), then
E[Y]=1pandvar(Y)=1pp2.
First step: Let q=1p and write
E[Y]=k=1kP(Y=k)=k=1k(1p)k1p=pk=1kqk1
Note that kqk1 is the derivative of qk with respect to q. Hence,
E[Y]=pk=1ddq{qk}.
We can interchange the order of summation and differentiation (we won’t go into the technical requirements):
E[Y]=pddq(k=1qk)=pddq(q1q),

since k=1xk=x/(1x) if |x|<1.

Therefore
E[Y]=p(1)(1q)q(1)(1q)2=pp2=1p.
By a similar method, we obtain
E[Y(Y1)]=2(1p)p2.
Since E[Y(Y1)]=E[Y2Y]=E[Y2]E[Y], we have that
var(Y)=E[Y(Y1)]+E[Y]E[Y]2=1pp2.

5.4.4 Negative binomial Distribution

Negative binomial distribution.

Consider a sequence of independent Bernoulli trials, each with success probability p. If W is the number of trials needed until r successes have occurred then W has a Negative Binomial distribution, WNegBin(r,p), with probability mass function
pW(k)=(k1r1)pr(1p)krk=r,r+1,

To see this: We must have the kth trial is successful and that it is the rth success. Therefore we have r1 successes in first k1 trials, the locations of which can be chosen in (k1r1) ways.

Negative binomial distribution: Expectation and variance.

Let WNegBin(r,p), then
E[W]=rpandvar(W)=r(1p)p2.
Note that we can write
W=Y1+Y2++Yr,

where Y1 is the number of trials until the first success and, for i=2,3,,r, Yi is the number of trials after the (i1)st success until the ith success.

We observe that Y1,Y2,,Yr are independent Geom(p) random variables, so
E[Y1]=1pandvar(Y1)=1pp2,
whence
E[W]=E[Y1+Y2++Yr]=E[Y1]+E[Y2]++E[Yr]=rp,
and
var(W)=var(Y1+Y2++Yr)=var(Y1)+var(Y2)++var(Yr)=r(1p)p2.


The negative binomial distribution WNegBin(r,p) is the sum of r independent geometric, YGeom(p) distributions in the same way that the binomial distribution XBin(n,p) is the sum of n independent Bernoulli random variables with success probability p.

An R Shiny app is provided to explore the Negative Binomial distribution.

R Shiny app: Negative Binomial distribution

Example 5.4.10 draws together the different Bernoulli-based distributions and demonstrates how they are used to answer different questions of interest.

Crazy golf.

Crazy golf picture.

Figure 5.2: Crazy golf picture.

A child plays a round of crazy golf. The round of golf consists of 9 holes. The number of shots the child takes at each hole is geometrically distributed with success probability 0.25.

  1. Calculate the probability that the child gets a ‘hole in one’ on the first hole. (A ‘hole in one’ means the child only takes one shot on that hole.)
  2. Calculate the probability that the child takes more than five shots on the first hole.
  3. Calculate the probability that the child gets three `hole in one’ during their round.
  4. Calculate the mean and variance for the total number of shots the child takes.
  5. Calculate the probability that the child takes 36 shots in completing their round.

Attempt Example 5.4.10 (Crazy golf) and then watch Video 11 for the solutions.

Video 11: Crazy Golf Example

Solution to Example 5.4.10.

Let Xi denote the number of shots taken on hole i. Then XiGeom(0.25).

  1. A ‘hole in one’ on the first hole is the event {X1=1}. Therefore
P(Hole in one)=P(X1=1)=0.25.
  1. More than five shots on the first hole is the event {X1>5}. Therefore
P(X1>5)=0.755=0.2373.
  1. This is a binomial question since there are n=9 holes and on each hole there is p=P(X1=1)=0.25 of obtaining a hole in one. Let YBin(9,0.25) denote the number of holes in one in a round, then
P(Y=3)=(93)(0.25)3(0.75)6=0.2336.
  1. The total number of shots taken is
Z=X1+X2++X9NegBin(9,0.25).

Thus the mean number of shots taken is E[Z]=90.25=36 and the variance of the number of shots is var(Z)=9(10.25)0.252=108.
(e) The probability that the child takes exactly 36 (mean number of) shots is

P(Z=36)=(36191)(0.25)9(0.75)27=0.0380.

5.5 Poisson distribution

The Poisson distribution is often used to model ‘random’ events - e.g. hits on a website; traffic accidents; customers joining a queue etc.

Suppose that events occur at rate λ>0 per unit time. Divide the time interval [0,1) into n small equal parts of length 1/n.

[0,1)=[0,1n)[1n,2n)[in,i+1n)[n1n,1).

Assume that each interval can have either zero or one event, independently of other intervals, and

P[1 event in[in,i+1n)]=λn.
Four events (red crosses) in 50 sub-intervals of [0,1].

Figure 5.3: Four events (red crosses) in 50 sub-intervals of [0,1].

Let X be the number of events in the time interval [0,1). Then
XBin(n,λ/n)
and letting n, (the number of intervals grows but the chance of observing an event in a given interval decreases), we have that
P(X=k)=(nk)(λn)k(1λn)nk=n(n1)(nk+1)k!×λknk×(1λn)nk=n(n1)(nk+1)nk×λkk!×(1λ/n)n(1λ/n)k=1(11n)(1(k1)n)×λkk!×(1λ/n)n(1λ/n)k1×λkk!×exp(λ) as n, for fixed k.

Poisson distribution

Let X be a discrete random variable with parameter λ>0 and p.m.f.

P(X=x)=λxx!exp(λ)(x=0,1,).

Then X is said to follow a Poisson distribution with parameter λ, denoted XPo(λ).

Poisson distribution: Expectation and variance.

Let XPo(λ), then

E[X]=λandvar(X)=λ.
By definition of expectation,
E[X]=x=0xP(X=x)=x=1xP(X=x),

since 0×P(X=0)=0.

Now

E[X]=x=1x×λxx!exp(λ)=exp(λ)x=1xλxx!=λexp(λ)x=1λx1(x1)!.

Using a change of variable k=x1,

E[X]=λexp(λ)k=0λkk!=λexp(λ)exp(λ)=λ.
Similarly, we can show that
E[X(X1)]=x=0x(x1)(λxx!exp(λ))=λ2.

Therefore, as noted in Lemma 5.4.7 (5.2), we have that E[X2]=E[X(X1)]+E[X] giving

var(X)=E[X(X1)]+E[X]E[X]2=λ2+λλ2=λ.

5.6 Exponential distribution and its extensions

In this section we start with the Exponential random variable which is an important continuous distribution that can take positive values (on the range [0,)). The Exponential distribution is the continuous analogue of the Geometric distribution. The sum of exponential distributions leads to the Erlang distribution which is a special case of the Gamma distribution. Another special case of the Gamma distribution is the χ2 (Chi squared) distribution which is important in statistics. Finally, we consider the Beta distribution which is continuous distribution taking values on the range (0,1) and can be constructed from Gamma random variables via a transformation. (See Section 14 for details on transformations.)

5.6.1 Exponential distribution

Let X denote the total number of hits on a website in time t. Let λ - rate of hits per unit time, and so, λt - rate of hits per time t.

A suitable model as we have observed in Section 5.5 for X is Po(λt).

Let T denote the time, from a fixed point, until the first hit. Note that T is continuous 0<T< whilst the number of hits X is discrete. Then T>t if and only if X=0. Hence,
P(T>t)=P(X=0)=exp(λt)
and so,
P(Tt)=1exp(λt)(t>0).
Therefore the cumulative distribution function of T is
FT(t)={0t<01exp(λt)t0
Differentiating FT(t) with respect to t gives
fT(t)={0t<0λexp(λt)t0

Exponential distribution

A random variable T is said to have an exponential distribution with parameter λ>0, written TExp(λ) if its c.d.f. is given by FT(t)={1eλtt>00t0, and its p.d.f. is fT(t)=ddtFT(t)={λeλtt>00t0

Exponential distribution: Expectation and variance.

Let TExp(λ), then E[T]=1λandvar(T)=1λ2.

The expectation of T is
E[T]=tfT(t)dt=0tλeλtdt.
Using integration by parts, we have that
E[T]=[t×eλt]00eλtdt=(00)+[1λeλt]0=1λ.
Similarly, by using integration parts twice,
E[T2]=0t2λeλtdx=2λ2.
Therefore the variance of T is
var(T)=E[T2]E[T]2=2λ21λ2=1λ2.


5.6.2 Gamma distribution

Suppose that we want to know, W, the time until the mth (m=1,2,) hit on a website. Then W=T1+T2++Tm where Ti is the time from the (i1)st hit on the website until the ith hit on the website.

Illustration with $m=3$, $W=T_1 +T_2 +T_3$.

Figure 5.4: Illustration with m=3, W=T1+T2+T3.

Note that T1,T2, are independent and identically distributed i.i.d. according to TExp(λ). That is, W is the sum of m exponential random variables with parameter λ. Then W follows a Gamma distribution with WGamma(m,λ).

Gamma distribution

A random variable X is said to have a Gamma distribution with parameters α,β>0, written XGamma(α,β) if its p.d.f. is given by
fX(x)={βαΓ(α)xα1exp(βx)x>00x0,

where Γ(α)=0yα1exp(y)dy.

Note that if α is an integer Γ(α)=(α1)!. Also Γ(12)=π and for α>1,
Γ(α)=(α1)Γ(α1).

By definition, for α=1, XExp(β) and for αN, the Gamma distribution is given by the sum of α exponential random variables. The special case where α is integer is sometimes referred to as the Erlang distribution. However, the gamma distribution is defined for positive, real-valued α.

The α parameter is known as the shape parameter and determines the shape of the Gamma distribution. In particular, the shape varies dependent on whether α<1, α=1 or α>1.

  • α<1, the modal value of X is at 0 and f(x) as x0 (x tends to 0 from above).
  • α=1, the exponential distribution. The modal value of of X is at 0 and f(0)=β.
  • α>1, f(0)=0 and the modal value of X is at α1β.
The β parameter is known as the scale parameter. It does not affect the shape of the Gamma distribution but has the property that if UGamma(α,1), then X has the same distribution as U/β. This can be written as
XD=Uβ1βGamma(α,1).

Equality in distribution

Two random variables X and Y are said to be equal in distribution, denoted XD=Y, if for all xR,
P(Xx)=P(Yx).

That is, X and Y have the same c.d.f., or equivalently, X and Y have the same p.d.f. (p.m.f.) if X and Y are continuous (discrete).

An R Shiny app is provided to explore the Gamma distribution.

R Shiny app: Gamma distribution

Gamma distribution: Expectation and variance.

If XGamma(α,β) then
E[X]=αβ,var(X)=αβ2.

The proof is straightforward if α=mN since then X=T1+T2++Tm, where the Ti are i.i.d. according to TExp(β). (Compare with the proof of Lemma 5.4.9 for the mean and variance of the negative binomial distribution.)

We omit the general proof for αR+, which can be proved by integration by parts.

We have noted that the Gamma distribution arises as the sum of exponential distributions. More general if X1Gamma(α1,β) and X2Gamma(α2,β) are independent gamma random variables with a common scale parameter β>0, then
X1+X2Gamma(α1+α2,β).

5.6.3 Chi squared distribution

The chi squared (χ2) distribution is a special case of the Gamma distribution which plays an important role in statistics. For kN, if
XGamma(k2,12)
then X is said to follow a chi squared distribution with k degrees of freedom. Note that X has probability density function
fX(x)={xk21exp(x2)2k2Γ(k2)x>00x0,

with E[X]=k and var(X)=2k.

5.6.4 Beta distribution

Suppose that X and Y are independent random variables such that XGamma(α,γ) and YGamma(β,γ) for some α,β,γ>0. Note that both X and Y have the same scale parameter γ. Let
Z=XX+Y,

the proportion of the sum of X and Y accounted for by X. Then Z will take values on the range [0,1] and Z follows a Beta distribution with parameters α and β.

Beta distribution

A random variable Z is said to have a Beta distribution with parameters α,β>0, written ZBeta(α,β) if its pdf is given by
fZ(z)={Γ(α+β)Γ(α)Γ(β)zα1(1z)β10<z<10otherwise.
Note that if ZBeta(α,β), then
E[Z]=αα+βandvar(Z)=αβ(α+β)2(α+β+1).
The special case where α=β=1, fZ(z)=1 (0<z<1) and Z is uniformly distributed on [0,1] denoted ZU(0,1). That is,
Beta(1,1)D=U(0,1).

An R Shiny app is provided to explore the Beta distribution.

R Shiny app: Beta distribution

Example 5.6.7 (Catching a bus) draws together the different Exponential-based distributions and demonstrates how they are used to answer different questions of interest.

Catching a bus.

Bus picture.

Figure 5.5: Bus picture.

Suppose that the time (in minutes) between buses arriving at a bus stop follows an Exponential distribution, YExp(0.5). Given you arrive at the bus stop just as one bus departs:

  1. Calculate the probability that you have to wait more than 2 minutes for the bus.
  2. Calculate the probability that you have to wait more than 5 minutes for the bus given that you wait more than 3 minutes.
  3. Given that the next two buses are full, what is the probability you have to wait more than 6 minutes for a bus (the third bus to arrive)?
  4. What is the probability that the time until the third bus arrives is more than double the time until the second bus arrives?

Attempt Example 5.6.7 (Catching a bus) and then watch Video 12 for the solutions.

Video 12: Catching a bus

Solution to Example 5.6.7.
For an exponential random variable, XExp(β), we have that for any x>0,
P(X>x)=1P(Xx)=1{1exp(βx)}=exp(βx).
  1. Since YExp(0.5),
    P(Y>2)=exp(0.5(2))=exp(1)=0.3679.
  2. Note that {Y>5} implies that {Y>3}. Therefore
    P(Y>5|Y>3)=P(Y>5)P(Y>3)=exp(0.5(5))exp(0.5(3))=exp(1)=0.3679.
    Therefore
    P(Y>5|Y>3)=P(Y>2).
    This property is known as the memoryless property of the exponential distribution, for any s,t>0,
    P(Y>s+t|Y>s)=P(Y>t).
  3. The time, W, until the third bus arrives is WGamma(3,0.5). Therefore
    fW(w)=0.53(31)!w31exp(0.5w)(w>0),
    and
    FW(w)=1exp(0.5w)[1+0.5w+(0.5w)22].
Hence,
P(W>6)=1FW(6)=exp(0.5(6))[1+0.5(6)+(0.5×6)22]=0.4232.
  1. This question involves the beta distribution. Let Z denote the time until the second bus arrives and let T denote the time between the second and third bus arriving. Then we want P(Z+T>2Z). Rearranging Z+T>2Z, we have that this is equivalent to
    12>ZZ+T,
    where
    ZZ+T=UBeta(2,1)
    with U having p.d.f.
    fU(u)=(2+11)!(21)!(11)!u21(1u)11=2u(0<u<1).
    Hence
    P(Z+T>2Z)=P(U<0.5)=0.502udu=[u2]0.50=0.25.

5.7 Normal (Gaussian) Distribution

Normal (Gaussian) distribution.

X is said to have a normal distribution, XN(μ,σ2), if it has p.d.f.
fX(x)=12πσexp(12σ2[xμ]2),xR,

where μR and σ>0.

The parameters μ and σ of the normal distribution specify the mean and standard deviation with
E[X]=xfX(x)dx=μ
and
E[X2]=x2fX(x)dx=σ2+μ2
giving
var(X)=E[X2]E[X]2=σ2.

The normal distribution is symmetric about its mean μ with the p.d.f. decreasing as [xμ]2 increases. Therefore the median and mode of the normal distribution are also equal to μ. See Figure 5.6 for the p.d.f. and c.d.f. of N(0,1).


The c.d.f. of the normal distribution XN(μ,σ2) is FX(x)=xf(y)dy=x12πσexp(12σ2[yμ]2)dy, and has no analytical solution. (i.e. We cannot solve the integral.)

How do we proceed with the Normal distribution if we cannot compute its c.d.f.?

The simplest solution is to use a statistical package such as R to compute probabilities (c.d.f.) for the Normal distribution. This can be done using the pnorm function. However, it is helpful to gain an understanding of the Normal distribution and how to compute probabilities (c.d.f.) for the Normal distribution using the good old-fashioned method of Normal distribution tables.

The starting point is to define the standard normal distribution, ZN(0,1). We can then show that for any XN(μ,σ2) and a,bR, P(a<X<b) can be rewritten as P(a<X<b)=P(c<Z<d)=P(Z<d)P(Z<c) where c and d are functions of (a,μ,σ) and (b,μ,σ), respectively. It is thus sufficient to know the c.d.f. of Z. Note that when Z is used to define a normal distribution it will always be reserved for the standard normal distribution.

Traditionally, probabilities for Z are obtained from Normal tables, tabulated values of P(Z<z) for various values of z. Typically, P(Z<z) for z=0.00,0.01,,3.99 are reported with the observation P(Z<z)=1P(Z<z) used to obtain probabilities for negative values.

A normal table will usually look similar to the table below.

z 0 1 2 3 4
0.0 0.5 0.504 0.508 0.512 0.516
0.1 0.5398 0.5438 0.5478 0.5517 0.5557
1.2 0.8849 0.8869 0.8888 0.8907 0.8925

The first column, labelled z, increments in units of 0.1. Columns 2 to 11 are headed 0 through to 9. To find P(Z<z)=P(Z<r.st) where z=r.st and r,s,t are integers between 0 and 9, inclusive, we look down the z column to the row r.s and then look along the row to the column headed t. The entry in row “r.s” and column “t” is P(Z<r.st). For example, P(Z<1.22)=0.8888.

Standard Normal distribution.

If μ=0 and σ=1 then ZN(0,1) has a standard normal distribution with p.d.f. ϕ(x)=12πex2/2,xR, and c.d.f. Φ(x)=x12πey2/2dy Note the notation ϕ() and Φ() which are commonly used for the p.d.f. and c.d.f. of Z.

Standard normal, Z~N(0,1), p.d.f. and c.d.f.

Figure 5.6: Standard normal, Z~N(0,1), p.d.f. and c.d.f.

Transformation of a Normal random variable.

If XN(μ,σ2) and Y=aX+b then YN(aμ+b,a2σ2).

An immediate Corollary of Lemma 5.7.4 is that if XN(μ,σ2), then Xμσ=ZN(0,1). This corresponds to setting a=1/σ and b=μ/σ in Lemma 5.7.4.

Hence, for any xR,
P(Xx)=P(Xμσxμσ)=P(Zxμσ)=Φ(xμσ).

Percentage points

The inverse problem of finding for a given q (0<q<1), the value of z such that P(Z<z)=q is often tabulated for important choices of q. The function qnorm in R performs this function for general XN(μ,σ2).

Sums of Normal random variables

Suppose that X1,X2,,Xn are independent normal random variables with XiN(μi,σ2i). Then for a1,a2,,anR,
Y=ni=1aiXiN(ni=1aiμi,ni=1a2iσ2i).


Lemonade dispenser

Suppose that the amount of lemonade dispensed by a machine into a cup is normally distributed with mean 250ml and standard deviation 5ml. Suppose that the cups used for the lemonade are normally distributed with mean 260ml and standard deviation 4ml.

  1. What is the probability that the lemonade overflows the cup?
  2. What is the probability that the total lemonade in 8 cups exceeds 1970ml?

Attempt Example 5.7.7 (Lemonade dispenser) and then watch Video 13 for the solutions.

Video 13: Lemonade dispenser

Solution to Example 5.7.7
  1. Let L and C denote the amount of lemonade dispensed and the size of a cup (in ml), respectively. Then LN(250,52) and CN(260,42), and we want:
    P(C<L)=P(CL<0).
    Note that CL follows a normal distribution (use Definition 5.7.6. Sums of Normal random variables with n=2, X1=C, X2=L, a1=1 and a2=1) with CLN(260250,25+16)=N(10,41).
Therefore, if YN(10,41),
P(C<L)=P(Y<0)=P(Y1041<01041)=Φ(1.56)=0.0594.

Note that the answer is given by pnorm(-1.56) and for the answer rounded to 4 decimal places round(pnorm(-1.56),4).

  1. Let LiN(250,52) denote the total number of lemonade dispensed into the ith cup. Then the total amount of lemonade dispensed into 8 cups is S=L1+L2++L8N(2000,200). Therefore
    P(S>1970)=P(S2000200>19702000200)=P(Z>2.12)=1Φ(2.12)=0.9830.


Student Exercises

Attempt the exercises below.


Let X be a continuous random variable with pdf fX(x)={kx3if 0<x<1,ke1xif x1,0otherwise.

  1. Evaluate k and find the (cumulative) distribution function of X.
  2. Calculate P(0.5<X<2) and P(X>2|X>1).
Solution to Exercise 5.1.
  1. Since f is a pdf, f(x)dx=1. Thus, 10kx3dx+1ke1xdx=1k(14+1)=1k=45. Recall that FX(x)=xf(t)dt. Thus, FX(x)=0 if x0. If 0<x<1, then
    FX(x)=x045t3dt=x45
    and, if x>1, then
    FX(x)=1045t3dt+x145e1tdt=15+45(1e1x)=145e1x.
    Thus,
    FX(x)={0if x0,x45if 0<x1,145e1xif x>1.
    Below are plots of fX(x) and FX(x) on the range (0,5).
  1. P(1/2<X<2)=21/2fX(x)dx=11/245x3dx+2145e1xdx=[x45]11/2+[45e1x]21=316+45(1e1)=0.6932.
    Since
    P(X>2)=245e1xdx=[45e1x]2=45e1
    and
    P(X>1)=145e1xdx=[45e1x]1=45,
    we have
    P(X>2|X>1)=P(X>2,X>1)P(X>1)=P(X>2)P(X>1)=e1=0.3679.



The time that a butterfly lives after emerging from its chrysalis is a random variable T, and the probability that it survives for more than t days is equal to 36/(6+t)2 for all t>0.

  1. What is the probability that it will die within six days of emerging?
  2. What is the probability that it will live for between seven and fourteen days?
  3. If it has lived for seven days, what is the probability that it will live at least seven more days?
  4. If a large number of butterflies emerge on the same day, after how many days would you expect only 5% to be alive?
  5. Find the pdf of T.
  6. Calculate the mean life of a butterfly after emerging from its chrysalis.
Solution to Exercise 5.2.
  1. P(T6)=1P(T>6)=136122=34.
  2. P(7<T14)=P(T>7)P(T>14)=3613236202=207916900=0.1230.
  3. P(T>14|T>7)=P(T>14,T>7)P(T>7)=P(T>14)P(T>7)=(1320)2=0.4225.
  4. Let d be the number of days after only which 5% of the butterflies are expected to be alive. Then, P(T>d)=1/20. Thus,
    36(6+d)2=120(6+d)2=20×366+d=125d=1256=20.83.
  5. Let fT and FT be the pdf and distribution function of T, respectively. Then, for t>0,
    FT(t)=P(Tt)=1P(T>t)=136(6+t)2,
    so
    fT(t)=ddtFT(t)=72(6+t)3.
    Clearly, fT(t)=0 for t0.
  6. E[T]=tfT(t)dt=072t(6+t)3dt.
    Substituting x=6+t gives
    E[T]=672(x6)x3dx=[72x+3×72x2]6=6.



A type of chocolate bar contains, with probability 0.1, a prize voucher. Whether or not a bar contains a voucher is independent of other bars. A hungry student buys 8 chocolate bars. Let X denote the number of vouchers that she finds.

  1. What sort of distribution does X have?
  2. How likely is it that the student finds no vouchers?
  3. How likely is it that she finds at least two vouchers?
  4. What is the most likely number of vouchers that she finds?

A second student keeps buying chocolate bars until he finds a voucher. Let Y denote the number of bars he buys.

  1. What is the probability mass function of Y?
  2. How likely is it that the student buys more than 5 bars?
  3. What is E[Y]?
  4. If each bar costs 35p, what is the expected cost to the student?

A third student keeps buying chocolate bars until they find 4 vouchers. In doing so, they buys a total of W bars.

  1. What is the distribution of W?
  1. What is the probability that this student buys exactly 10 bars?
Solutions to Exercise 5.3.
  1. XBin(8,0.1).
  2. P(X=0)=(0.9)8=0.4305.
  3. P(X2)=1P(X=0)P(X=1)=1(0.9)88(0.9)7(0.1)=0.1869.
  4. X=0 Since the probability mass function has only one maximum, and 0.4305=P(X=0)>P(X=1)=0.3826.
  5. YGeom(0.1), so P(Y=k)=(0.9)k1(0.1),k=1,2,3,.
  6. P(Y>5)=0.95 (the probability of starting with 5 failures), so P(Y>5)=0.5905.
  7. For a Geom(p), the mean is 1/p, so E[Y]=1/(0.1)=10.
  8. Cost (in pence) is 35Y, so E[35Y]=35E[Y]=350p, or £3.50.
  9. W is negative binomial with parameters 4 and 0.1. WNegBin(4,0.1).
  10. P(W=10)=(93)(0.1)4(0.9)6=0.004464.



A factory produces nuts and bolts on two independent machines. The external diameter of the bolts is normally distributed with mean 0.5 cm and the internal diameter of the nuts is normally distributed with mean 0.52 cm. The two machines have the same variance which is determined by the rate of production. The nuts and bolts are produced at rate which corresponds to a standard deviation σ=0.01 cm and a third machine fits each nut on to the corresponding bolt as they are produced, provided the diameter of the nut is strictly greater than that of the bolt, otherwise it rejects both.

  1. Find the probability that a typical pair of nut and bolt is rejected.
  2. If successive pairs of nut and bolt are produced independently, find the probability that in 20 pairs of nut and bolt at least 1 pair is rejected.
  3. The management wishes to reduce the probability that a typical pair of nut and bolt is rejected to 0.01. What is the largest value of σ to achieve this?
Solutions to Exercise 5.4.
  1. Let X denote the internal diameter of a typical nut and Y denote the external diameter of a typical bolt. Then XN(0.52,0.012) and YN(0.50,0.012).
    A pair of nut and bolt is rejected if W=XY0. Since X and Y are independent, WN(0.520.50,0.012+0.012)=N(0.02,0.0002). Thus the probability that a pair is rejected is
    P(W0)=P(W0.020.000200.020.0002)=P(Z2),
    where Z=(W0.02)/0.0002N(0,1).

Hence the required probability is Φ(2)=0.0786 using pnorm(-sqrt(2)).

  1. The probability that no pair is rejected is (10.0786)20, since successive pairs are produced independently, so the probability that at least one pair is rejected is 1(10.0786)20=0.8055.
  2. Arguing as in (a), the probability a pair is rejected is Φ(0.02/(2σ)). We want c such that Φ(c)=0.01 which gives c=2.3263. (This is given by qnorm(0.01)). Therefore we require 0.02/(2σ)2.3623σ0.02/(2.362×2)=0.0060. Hence the largest value of σ is 0.0060.