Chapter 4 Probability

4.1 Overview

In this Chapter we introduce probability as a measure associated with a random experiment. After providing a short motivation for probability (Section 4.2), we begin in Section 4.3 with the notion of a Sample space (Section 4.3), the set of possible outcomes of a random experiment and Events (Section 4.4), the outcome(s) which occur. This enables us in Section 4.5 to define probability as a finite measure which uses the scale 0 (impossible) to 1 (certain) to define the likelihood of an event. We conclude the chapter by introducing the concept of conditional probability (Section 4.6), the probability of one event given (conditional upon) another event (or events) having occurred. We present the key results of the Theorem of total probability and Bayes’ formula. The discussion of conditional probability leads us naturally to consider the dependence between two (or more) events and the notion of independence, where the probability of an event occurring is not affected by whether or not another event has occurred and we explore this further in Section 4.7.

4.2 Motivation

There are many situations where we have uncertainty and want to quantify that uncertainty.

  1. Manchester United will win the Premier League this season.
  2. The Labour Party will win the next general election.
  3. The £ will rise against the $ today.
  4. Coin tossed repeatedly - a head will turn up eventually.
  5. In 5 throws of a dart, I will hit the Bull’s eye once.
  6. If I play the lottery every week I will win a prize next year.

(a)-(c) are subjective probabilities, whereas (d)-(f) are objective/statistical/physical probabilities.

The general idea is:

  1. A conceptual random experiment EE.
  2. List all possible outcomes ΩΩ for the experiment EE but don’t know which occurs, has occurred or will occur.
  3. Assign to each possible outcome ωω a real number which is the probability of that outcome.

4.3 Sample Space

We begin by defining a set.

Set.
A set is a collection of objects. The notation for a set is to simply list every object separated by commas, and to surround this list by curly braces {{ and }}. The objects in a set are referred to as the elements of the set.

There is no restrictions on what constitutes as an object in set. A set can have a finite or infinite number of elements. The ordering of the elements in a list is irrelevant. Two sets are equal if and only if they have the same collection of elements.

Sample space.
The sample space ΩΩ for a random experiment EE is the set of all possible outcomes of the random experiment.

Rolling a die.

The sample space for the roll of a die is

Ω={1,2,3,4,5,6},Ω={1,2,3,4,5,6},

that is, the set of the six possible outcomes.


Dart in a target.

Dart into a circular target, radius 1:

Ω={(x,y):x2+y21},Ω={(x,y):x2+y21},
 Example: $(x,y) =(-0.25,0.15)$.

Figure 4.1: Example: (x,y)=(0.25,0.15)(x,y)=(0.25,0.15).

that is, the set of pairs of real numbers that are less than a distance 1 from the origin (0,0)(0,0).

Note that the examples illustrate how ΩΩ may be discrete or continuous.

4.4 Events

Event.
An event relating to an experiment is a subset of ΩΩ.

Toss two coins.

The sample space for the toss of two fair coins is

Ω={HH,HT,TH,TT}.Ω={HH,HT,TH,TT}.

Let AA be the event that at least one head occurs, then

A={HH,HT,TH}.A={HH,HT,TH}.

Note that the events HTHT (Head on coin 1 and Tail on coin 2) and THTH (Tail on coin 1 and Head on coin 2) are distinct events.


Volcano eruption.

The sample space of time in years until a volcano next erupts is Ω={t:t>0}=(0,),Ω={t:t>0}=(0,), that is, all positive real numbers. Let event LL be the volcano erupting in the next 1010 years, then L={t:0<t10}=(0,10].L={t:0<t10}=(0,10].

We summarise key set notation, involving sets EE and FF, below:

  1. We use ωEωE to denote that ωω is an element of the set EE. Likewise, ωEωE denotes that ωω is not an element of EE;
  2. The notation EFEF means that if ωEωE, then ωFωF. In this case, we say EE is a subset of FF;
  3. EcEc - complement of EE, sometimes written ˉE¯E.
    EcEc consists of all points in ΩΩ that are not in EE. Thus, EcEc occurs if and only if EE does not occur, see Figure 4.2.
 Complement example.

Figure 4.2: Complement example.

  1. The intersection of two sets EE and FF, denoted EFEF, is the set of all elements that belong to both EE and FF, see Figure 4.3.
 Intersection example.

Figure 4.3: Intersection example.

  1. If EF=EF= then EE and FF cannot both occur, i.e. EE and FF are disjoint (or exclusive) sets, see Figure 4.4.
 Disjoint (exclusive) example.

Figure 4.4: Disjoint (exclusive) example.

  1. The union of two sets EE and FF, denoted EFEF, is the set of all elements that belong to either EE or to FF, see Figure 4.5.
 Union example.

Figure 4.5: Union example.

  1. The set {}{} with no elements in it is called the empty set and is denoted .
    Note: Ωc=Ωc= and c=Ωc=Ω.

A summary of sets notation using the outcomes from a six sided die are presented in the Video 6.

Video 6: Set notation

4.5 Probability

There are different possible interpretations of the meaning of a probability:

  • Classical interpretation. Assuming that all outcomes of an experiment are equally likely, then the probability of an event A=n(A)n(Ω)A=n(A)n(Ω), where n(A)n(A) is the number of outcomes satisfying AA and n(Ω)n(Ω) is the number of outcomes in ΩΩ (total number of possible outcomes).

  • Frequency interpretation. The probability of an event is the relative frequency of observing a particular outcome when an experiment is repeated a large number of times under similar circumstances.

  • Subjective interpretation. The probability of an event is an individual’s perception as to the likelihood of an event’s occurrence.

Probability.

A probability (measure) is a real-valued set function PP defined on the events (subsets) of a sample space ΩΩ satisfying the following three axioms (see Kolmogorov, 1933):

  1. P(E)0P(E)0 for any event EE;
  2. P(Ω)=1P(Ω)=1;
  3. If E1,E2,,EnE1,E2,,En are disjoint events (i.e. EiEj=EiEj= for all ijij), then P(ni=1Ei)=ni=1P(Ei).P(ni=1Ei)=ni=1P(Ei).

If ΩΩ is infinite then 3. can be extended to:
3’. If E1,E2,E1,E2, is any infinite sequence of disjoint events (i.e. EiEj=EiEj= for all ijij), then P(i=1Ei)=i=1P(Ei).P(i=1Ei)=i=1P(Ei).

Note that all of the other standard properties of probability (measures) that we use are derived from these three axioms.


Using only the axioms above, prove:

  • 0P(E)10P(E)1 for any event EE;

  • P(EC)=1P(E)P(EC)=1P(E) where ECEC is the complement of EE;

  • P()=0P()=0;

  • P(AB)=P(A)+P(B)P(AB)P(AB)=P(A)+P(B)P(AB).

A summary of probability along with proofs of the results in Example 4.5.2 are provided in Video 7.

Video 7: Probability

Solution to Example 4.5.2.
Since for any event EE, Ω=EEcΩ=EEc, we have by axiom 2 that
1=P(Ω)=P(EEc).1=P(Ω)=P(EEc).
By axiom 3, since EE and EcEc are disjoint events, we have that
1=P(EEc)=P(E)+P(Ec).1=P(EEc)=P(E)+P(Ec).

which rearranges to give P(Ec)=1P(E)P(Ec)=1P(E).

Special cases
If E=E=, then Ec=ΩEc=Ω giving 1=P()+11=P()+1 and it follows that P()=0P()=0.
Since P(Ec)0P(Ec)0, we have that P(E)1P(E)1 and hence 0P(E)10P(E)1.

To study P(AB)P(AB), we note that ABAB is formed by the union of the disjoint events: ABcABc, ABAB and AcBAcB. Therefore using axiom 3,
P(AB)=P(ABc)+P(AB)+P(AcB).P(AB)=P(ABc)+P(AB)+P(AcB).
Similarly, we have that
P(A)=P(ABc)+P(AB)P(A)=P(ABc)+P(AB)
and
P(B)=P(AB)+P(AcB).P(B)=P(AB)+P(AcB).
Since P(AcB)=P(B)P(AB)P(AcB)=P(B)P(AB), we have that
P(AB)=P(ABc)+P(AB)+P(AcB)=P(A)+P(B)P(AB)=P(A)+P(B)P(AB).P(AB)=P(ABc)+P(AB)+P(AcB)=P(A)+P(B)P(AB)=P(A)+P(B)P(AB).


In many cases, ΩΩ consists of N(=n(Ω))N(=n(Ω)) equally likely elements, i.e. Ω={ω1,ω2,,ωN},Ω={ω1,ω2,,ωN}, with P(ωi)=1NP(ωi)=1N.

Then, for any event EE (i.e. subset of ΩΩ), P(E)=n(E)n(Ω)=n(E)NP(E)=n(E)n(Ω)=n(E)N coinciding with the Classical interpretation of probability.


1. Throw a die. Ω={1,2,3,4,5,6}Ω={1,2,3,4,5,6}. P({1})=P({2})==P({6})=16.P({1})=P({2})==P({6})=16. The probability of throwing an odd number is P(Odd)=P({1,3,5})=36=12.P(Odd)=P({1,3,5})=36=12.
2. Draw a card at random from a standard pack of 52. Ω={A,2,3,,K}.Ω={A,2,3,,K}. P(ω)=1/52P(ω)=1/52 for all ωΩωΩ.
If E={Black}E={Black} and F={King}F={King}, then there are 26 black cards, n(E)=26, 4 kings, n(F)=4 and 2 black kings (K and K), n(EF)=2, P(EF)=P(E)+P(F)P(EF)=2652+452252=713.

4.6 Conditional probability

Conditional Probability.

The conditional probability of an event E given an event F is

P(EF)=P(EF)P(F), provided P(F)>0.

Note if P(F)>0, then P(EF)=P(E|F)P(F).

Moreover, since EF=FE, we have that P(E|F)P(F)=P(EF)=P(FE)=P(F|E)P(E). In other words, to compute the probability of both events E and F occurring, we can either:

  • Consider first whether E occurs, P(E), and then whether F occurs given that E has occurred, P(F|E),
  • Or consider first whether F occurs, P(F), and then whether E occurs given that F has occurred, P(E|F).

Rolling a die.

Consider the experiment of tossing a fair 6-sided die. What is the probability of observing a 2 if the outcome was even?

Let event T be observing a 2 and let event E be the outcome is even. Find P(T|E):

P(T|E)=P(TE)P(E)=1/61/2=13.

Independence.

Two events E and F are independent if
P(EF)=P(E)P(F).


If P(F)>0, two events, E and F, are independent if and only if P(E|F)=P(E).

P(E|F)=P(E)P(EF)P(F)=P(E)P(EF)=P(E)P(F)E and F are independent.


Observations on Independence

  • If E and F are NOT independent then P(EF)P(E)P(F).
  • E and F being independent is NOT the same as E and F being disjoint.

Independence: P(EF)=P(E)P(F).
Disjoint (exclusive): P(EF)=P()=0.

Rolling a die.

Consider the experiment of tossing a fair 6-sided die.
Let E={2,4,6}, an even number is rolled on the die and F={3,6}, a multiple of 3 is rolled on the die.

Are E and F independent?

EF={6}, so P(EF)=16.
P(E)×P(F)=36×26=16=P(EF)

Therefore E and F are independent.

Partition.

A partition of a sample space Ω is a collection of events E1,E2,,En in Ω such that:

  1. EiEj= for all ij (disjoint sets)
  2. E1E2En=ni=1Ei=Ω.

We can set n= in Definition 4.6.6 and have infinitely many events constructed the partition.

Figure 4.6 presents an example of a partition of Ω using six events E1,E2,,E6.

Example of a partition of a sample space using six events.

Figure 4.6: Example of a partition of a sample space using six events.

For an event FΩ,
F=(FE1)(FE2)(FEn).

This is illustrated in Figure 4.7 using the partition given in Figure 4.6.

The event *F* expressed in terms of the union of events.

Figure 4.7: The event F expressed in terms of the union of events.

Theorem of Total Probability.

Let E1,E2,,En be a partition of Ω (i.e. EiEj= for all ij and ni=1Ei=Ω) and let FΩ be any event. Then,
P(F)=ni=1P(FEi)P(Ei).

Since the Ei’s form a partition:

  1. F=FΩ=ni=1[FEi].
  2. For each ij, [FEi][FEj]=.
Therefore
P(F)=ni=1P(FEi).
By the definition of conditional probability, for each i,
P(FEi)=P(F|Ei)P(Ei).

Substituting (4.2) into (4.1) completes the proof.

Tin can factory.

Suppose that a factory uses three different machines to produce tin cans. Machine I produces 50% of all cans, machine II produces 30% of all cans and machine III produces the rest of the cans. It is known that 4% of cans produced on machine I are defective, 2% of the cans produced on machine II are defective and 5% of the cans produced on machine III are defective. If a can is selected at random, what is the probability that it is defective?

Let event Mi be the can is produced by machine i, i=1,2,3. Let D be the event that the can is defective. From the question, we know
P(M1)=0.5,P(D|M1)=0.04,P(M2)=0.3,P(D|M2)=0.02,P(M3)=0.2,P(D|M3)=0.05.
Therefore
P(D)=3i=1P(D|Mi)P(Mi)=(0.04×0.5)+(0.02×0.3)+(0.05×0.2)=0.036.


Job interview problem.

A manager interviews 4 candidates for a job. The manager MUST make a decision offer/reject after each interview. Suppose that candidates are ranked 1,2,3,4 (1 best) and are interviewed in random order.

The manager interviews and rejects the first candidate. They then offer the job to the first candidate that is better than the rejected candidate. If all are worse then they offer the job to the last candidate.

What is the probability that the job is offered to the best candidate?

Attempt Example 4.6.9 (Job interview problem) and then watch Video 8 for the solution.

Video 8: Job Interview

Solution to Example 4.6.9 (Job interview problem).

Let F be the event that the best candidate is offered the job.

For k=1,2,3,4, let Ek be the event that candidate k (the kth best candidate) is interviewed first. Note that Eks form a partition of the sample space and by randomness,
P(E1)=P(E2)=P(E3)=P(E4)=14.

We have that:
1. P(F|E1)=0. If the 1st ranked candidate is interviewed first they will be rejected and cannot be offered the job.
2. P(F|E2)=1. If the 2nd ranked candidate is interviewed first then all candidates will be rejected until the best (1st ranked) candidate is interviewed and offered the job.
3. P(F|E3)=12. If the 3rd ranked candidate is interviewed first then whoever is interviewed first out of the 1st ranked and 2nd ranked candidates will be offered the job. Each of these possibilities is equally likely.
4. P(F|E4)=13. If the 4th ranked (worst) candidate is interviewed first then the 1st ranked candidate will only be offered the job if they are interviewed second.

By the Theorem of Total Probability,
P(F)=4i=1P(F|Ei)P(Ei)=P(F|E1)P(E1)+P(F|E2)P(E2)+P(F|E3)P(E3)+P(F|E4)P(E4)=0×14+1×14+12×14+13×14=1124.


Bayes Formula.

Let E1,E2,,En be a partition of Ω, i.e. EiEj= for all ij and ni=1Ei=Ω, such that P(Ei)>0 for all i=1,,n, and let FΩ be any event such that P(F)>0. Then
P(Ek|F)=P(F|Ek)P(Ek)ni=1P(F|Ei)P(Ei).
If P(F)>0 and P(Ek)>0, then by definition
P(Ek|F)=P(EkF)P(F)=P(F|Ek)P(Ek)P(F).
Since E1,E2,,En is a partition of Ω such that P(Ei)>0 for all i, then by the Theorem of Total Probability we can rewrite P(F) and obtain
P(Ek|F)=P(F|Ek)P(Ek)ni=1P(F|Ei)P(Ei).

Tin can factory (continued).

Consider Example 4.6.8. Suppose now that we randomly select a can and find that it is defective.

What is the probability that it was produced by machine I?
P(M1|D)=P(D|M1)P(M1)P(D)=0.04×0.50.036=0.55.

Guilty?

At a certain stage of a jury trial a jury member gauges that the probability that the defendant is guilty is 7/10.

The prosecution then produces evidence that fibres of the victim’s clothing were found on the defendant.

If the probability of such fibres being found is 1 if the defendant is guilty and 1/4 if the defendant is not guilty, what now should be the jury member’s probability that the defendant is guilty?

Attempt Example 4.6.12 (Guilty?) and then watch Video 9 for the solution.

Video 9: Guilty?

Solution to Example 4.6.12 (Guilty?)

Let G be the event that the defendant is guilty and let F be the event that fibres are found on the victim’s clothing.

We want P(G|F), the probability of being guilty given fibres are found on the victim’s clothing.

We have that P(G)=0.7, P(Gc)=1P(G)=0.3, P(F|G)=1 and P(F|Gc)=0.25.

Therefore, by Bayes’ Theorem,
P(G|F)=P(FG)P(F)=P(F|G)P(G)P(F|G)P(G)+P(F|Gc)P(Gc)=1×0.71×0.7+0.25×0.3=0.9032.


4.7 Mutual Independence

We can extend the concept of independence from two events to N events.

Mutual independence.

Events E1,E2,,EN are (mutually) independent if for any finite subset {i1,i2,,in}{1,,N},
P(nj=1Eij)=nj=1P(Eij).


Note, in particular, two events E and F are independent if
P(EF)=P(E)P(F)

Aircraft safety.

An aircraft safety system contains n independent components. The aircraft can fly provided at least one of the components is working. The probability that the ith component works is pi. Then
P(Aircraft can fly)=1P(Aircraft cannot fly)=1P(All components fail)=1ni=1P(Component i fails)=1ni=1(1pi).


Task: Session 3

Attempt the R Markdown file for Session 3:
Session 3: Probability in R

Student Exercises

Attempt the exercises below.


A card is drawn from a standard pack of 52. Let B be the event ‘the card is black’, NA the event ‘the card is not an Ace’ and H the event ‘the card is a Heart’. Calculate the following probabilities:

  1. P(B|NA);
  2. P(NA|Bc);
  3. P(BcH|NA);
  4. P(NAB|Hc);
  5. P(NAH|NAB).
Solution to Exercise 4.1.
  1. P(B|NA)=P(BNA)P(NA)=24/5248/52=12;
  2. P(NA|Bc)=P(NABc)P(Bc)=24/5226/52=1213;
  3. P(BcH|NA)=P(BcHNA)P(NA)=12/5248/52=14;
  4. P(NAB|Hc)=P([NAB]Hc)P(Hc)=38/5239/52=3839;
  5. Since [NAH][NAB]=NAB, we have that
    P(NAH|NAB)=P([NAH][NAB])P(NAB)=P(NAB)P(NAB)(=24/5224/52)=1.



A diagnostic test has a probability 0.95 of giving a positive result when applied to a person suffering from a certain disease, and a probability 0.10 of giving a (false) positive when applied to a non-sufferer. It is estimated that 0.5% of the population are sufferers. Suppose that the test is applied to a person chosen at random from the population. Find the probabilities of the following events.

  1. the test result will be positive.
  2. the person is a sufferer, given a positive result.
  3. the person is a non-sufferer, given a negative result.
  4. the person is missclassified.
Solution to Exercise 4.2.

Let T denote ‘test positive’ and S denote ‘person is a sufferer’. Then question tells us that P(T|S)=0.95, P(T|Sc)=0.1, and P(S)=0.005.

  1. P(T)=P(T|S)P(S)+P(T|Sc)P(Sc)=(0.95)(0.005)+(0.1)(0.995)=0.10425.
  2. P(S|T)=P(T|S)P(S)P(T)=(0.95)(0.005)0.10425=0.04556.
  3. P(Sc|Tc)=P(Tc|Sc)P(Sc)P(Tc)=[1P(T|Sc)]P(Sc)1P(T)=(0.9)(0.995)0.89575=0.9997.
  4. Missclassification means either the person is a sufferer and the test is negative, or the person is not a sufferer and the test is positive. These are disjoint events, so we require
    P(STc)+P(ScT)=P(Tc|S)P(S)+P(T|Sc)P(Sc)=[1P(T|S)]P(S)+P(T|Sc)P(Sc)=(0.05)(0.005)+(0.1)(0.995)=0.09975.



In areas such as market research, medical research, etc, it is often hard to get people to answer embarrassing questions. One way around this is the following. Suppose that N people are interviewed, where N is even. Each person is given a card, chosen at random from N cards, containing a single question. Half of the cards contain the embarrassing question, to which the answer is either ‘Yes’ or ‘No’. The other half of the cards contain the question ‘Is your birthday between January and June inclusive?’

Suppose that of the N people interviewed, R answer ‘Yes’ to the question that they received. Let Y be the event that a person gives a ‘Yes’ answer, E the event that they received a card asking the embarrassing question. Assuming that half the population have birthdays between January and June inclusive, write down:

  1. P(Y);
  2. P(E);
  3. P(Y|Ec).

Hence calculate the proportion of people who answered `Yes’ to the embarrassing question.

Hint: Try writing down an expression for P(Y) using the Theorem of Total Probability.

Comment on your answer.

Solution to Exercise 4.3.

We have that:

  1. P(Y)=RN;
  2. P(E)=12;
  3. P(Y|Ec)=12.
We want P(Y|E). By the Theorem of total probability
P(Y)=P(Y|E)P(E)+P(Y|Ec)P(Ec),
which gives
P(Y|E)=2RN12.
Comment: This only makes sense if 0P(Y|E)1, that is,
N4R3N4.

A little thought should help you see why.