19 Probability

So far, you have learnt to ask a RQ, design a study, and describe and summarise the data. In this chapter, you will learn about probability to describe the random nature of sample statistics. You will learn to:

  • explain probabilities.
  • apply the classical approach to probability in simple situations.
  • apply the relative frequency approach to probability in simple situations.
  • apply the subjective approach to probability in simple situations.
  • identify events that are independent.

19.1 Introduction

This chapter briefly discusses probability. Probability quantifies the chance that a specific, unknown result (an 'event') from some random procedure will happen in the future. Probabilities are similar to proportions, but probabilities concern unknown future events. Before discussing probability, some associated terms needs defining.

19.2 Random procedures, sample spaces and simple events

To talk about probability, we random procedures must be defined first.

Definition 19.1 (Random procedure) A random procedure is a sequence of well-defined steps that can be repeated, in theory, indefinitely under essentially identical conditions; has well-defined results; and the result of any individual repetition is unpredictable.

Using this definition, the result of rolling a die is a 'random procedure', with outcomes , , , , and .

A list of all distinct possible results from one instance of a random procedure is the sample space. A simple event is any element of the sample space.

Definition 19.2 (Sample space) The sample space is a list of all possible and distinct results after once administering a random procedure whose result is unknown beforehand.

Definition 19.3 (Simple event) A simple event is a single element of the sample space.

Example 19.1 (Sample spaces) Consider rolling a fair, six-sided die (the random procedure). We do not know what face will be uppermost until we roll the die.

However, the sample space for this procedure can be listed: , , , , and . These are all distinct results (no overlap), and the sample space is discrete.

The event 'rolling a ' is a simple event.

Combinations of the elements in the sample space are usually of interest. These are called compound events.

Definition 19.4 (Compound event) A compound event is any combination of simple events (i.e., of elements in the sample space).

Example 19.2 (Events) Many events can be defined using the sample space in Example 19.1, including:

  • Rolling a : This simple event includes one element of the sample space: .
  • Rolling a even number: This compound event includes three elements of the sample space: , and .
  • Rolling a number larger than : This compound event includes four elements of the sample space: , , and .

Example 19.3 (Sample spaces and events) Consider the distance you can throw a baseball (the random procedure). We do not know beforehand what distance your next throw will be, but the sample space (the throwing distance) is a number greater than \(0\) m (and some of those distances are very unlikely to occur...). This sample space is continuous.

Many compound events can be defined using this sample space; for example:

  • throwing more than \(50\) m.
  • throwing between \(10\) and \(40\) m.

Because the sample space is continuous, throwing an exact distance (such as exactly \(10\) m) is technically not possible.

Events of interest are often combinations of other events, often combined using and, or, not. Consider two events called \(A\) and \(B\). Then, '\(A\) and \(B\)' is the event where \(A\) and \(B\) are both true. '\(A\) or \(B\)' is the event where \(A\) is true, \(B\) is true, or are both true. The event 'not \(A\)' comprises all the events in the sample space that are not in Event \(A\).

Example 19.4 (Complicated events) Consider rolling a fair, six-sided die again (Example 19.1). Suppose we define these two (compound) events:

  • Event \(A\): Roll a number divisible by \(2\).
  • Event \(B\): Roll a number divisible by \(3\).

Event \(A\) comprises the simple events , and , and event \(B\) comprises the simple events and .

Then, the event '\(A\) and \(B\)' includes all events in \(A\) and also in \(B\); that is, '\(A\) and \(B\)' comprises the single simple event .

Event '\(A\) or \(B\)' include the events in \(A\), the events in \(B\), and those in both; that is, '\(A\) or \(B\)' comprises the four simple events , , and .

The event 'not \(A\)' comprises the three simple events , and .

19.3 Probability

Using these definitions, a probability can be defined.

Definition 19.5 (Probability) A probability is a number between \(0\) and \(1\) inclusive (or between \(0\)% and \(100\)% inclusive) that quantifies the likelihood that a certain event will occur.

A probability of \(0\) (or \(0\)%) means the event is 'impossible' (will never occur), and a probability of \(1\) (or \(100\)%) means that the event is certain to happen (will always occur). Most events have a probability between the extremes of \(0\)% and \(100\)%.

Example 19.5 (Probabilities) Consider these cases:

  • The probability of receiving negative rainfall is \(0\); it is impossible.
  • The probability of receiving some rain in Oslo next year is \(1\); it is certain.
  • The probability of receiving rain on a future day in Oslo is between \(0\) and \(1\) inclusive.

The probability of an event occurring can be computed in different ways, including:

  • the classical approach (Sect. 19.4);
  • the relative frequency approach (Sect. 19.5); and
  • the subjective approach (Sect. 19.6).

19.4 Classical approach

What is the probability of rolling a on a die? The sample space has six possible outcomes (see Example 19.1) that are equally likely to occur, and the event 'rolling a comprises just one of those. Thus, \[ \text{Prob. of rolling a four} = \frac{\text{The number of results that are a 4}}{\text{The number of possible results}} = \frac{1}{6}. \] This approach to computing probabilities is called the classical approach to probability, and is only appropriate when all events in the sample space are equally likely.

Definition 19.6 (Classical approach to probability) In the classical approach to probability, the probability of an event occurring is the number of elements of the sample space included in the event, divided by the total number of elements in the sample space, when all outcomes are equally likely.

By this definition:
\[ \text{Prob. of an event} = \frac{\text{Number of equally-likely results in the event of interest}}{\text{Total number of equally-likely results}}. \]

Example 19.6 (Complicated probabilities) Consider rolling a standard six-sided die. With six equally-likely results (Example 19.1), each with probability \(1/6\) (or, approximately, \(16.67\)%) of occurring, the probability of rolling a is \(1/6\).

We can say that 'the probability of rolling a is \(1/6\)', or 'the probability of rolling a is \(0.1667\)'. The answer can also be expressed as a percentage: 'the probability of rolling a is (approximately) \(16.67\)%'. The answer could also be interpreted as 'the expected proportion of rolls that are a is (approximately) \(0.1667\)'. That is, about \(16.67\)% of a very large number of future rolls are likely to be a .

The chance of rolling a in the future is \(0.1667\), but a roll of the die either will or will not produce a ... and we don't know which will occur.

Example 19.7 (Describing probability) Consider rolling a standard six-sided die.

  • The probability of rolling an even number is \(3 \div 6 = 0.5\).
  • The percentage of rolls expected to be even is \(3 \div 6 \times 100 = 50\)%.
  • The odds of rolling an even number is \(3\div 3 = 1\).

Example 19.8 (Simple probabilities) Consider rolling a standard six-sided die. There are six equally-likely results (Example 19.1) each with probability \(1/6\) (or \(16.67\)%) of occurring. The probability of rolling a or a is \(2/6\) (or \(33.33\)%).

Probabilities describe the likelihood that an event will occur before the result is known. Odds and proportions can be used either before or after the result is known, provided the wording is correct.

For example, proportions describe how often an event has occurred after the result is known, and expected proportions describe the likelihood that an event will occur before the result is known.

The following example may help explain.

Example 19.9 (Probabilities, proportions and odds) Before a fair coin is tossed:

  • The probability of throwing a head in the future is \(1/2 = 0.5\).
  • The expected proportion of heads for many future coin tosses is \(0.5\) (i.e., \(50\)%).
  • The odds of throwing a head in the future is \(1/1 = 1\).

If we have already tossed a coin \(100\) times and found \(47\) heads:

  • The proportion of heads in the sample is \(47/100 = 0.47\).
  • The odds that we threw a head in the sample is \(47/53 = 0.887\).

The 'probability that we just threw a head' makes no sense, because the event has already occurred.

19.5 Relative frequency approach

What is the probability that a new-born baby will be a boy? The sample space could be listed as: boy and non-boy. The classical approach could be used, since the sample space has two elements: \(1\div2 = 0.5\). However, this approach is appropriate only if boys and non-boys are equally likely to be born. But are they?

In Australia in 2021, \(289\ 603\) live births occurred, with \(148\ 636\) male births, \(140\ 944\) female births, and \(23\) others (or 'not stated'). The proportion of boys born in 2021 is \(148\ 636\div 289\ 603 = 0.513\), or about \(51.3\)%. An estimate of the probability that the next birth will be a boy is about \(0.513\) (or \(51.3\)%). This is the relative frequency approach to calculating probabilities: based on past data.

Using the relative frequency method can only ever produce an approximate probability, as it is based on a limited number of past observations. An actual probability would require an infinite number of observations.

Definition 19.7 (Relative frequency approach to probability) In the relative frequency approach to probability, the probability of an event is approximately the number of times the outcomes of interest has appeared in the past, divided by the number of 'attempts' in the past. This produces an approximate probability.

Example 19.10 (Relative frequency probability) Based on the earlier information, the odds that a new baby will be a boy is approximately \(0.513\div (1 - 0.513) = 1.053\). According to the Australian Bureau of Statistics (ABS):

The sex ratio for all births registered in Australia generally fluctuates around \(105.5\) male births per \(100\) female births.

This is close to the odds of \(1.053\) found above.

Table 19.1 comes from a study of Iranian children aged 6--18 years (Kelishadi et al. 2017). Find the probability that a randomly chosen student is:

  • A female student.
  • A female student who skipped breakfast.
  • A female student, if we already know the child skipped breakfast.
  1. \(6640/13\ 486 = 49.2\)%. 2: \(2383/13\ 486 = 17.7\)%. 3: \(2383/4327 = 55.1\)%.
TABLE 19.1: The number of Iranian children aged 6 to 18 who skip and do not skip breakfast
Skips breakfast Doesn't skip breakfast Total
Females \(2\,383\) \(4\,257\) \(\phantom{0}6\,640\)
Males \(1\,944\) \(4\,902\) \(\phantom{0}6\,846\)
Total \(4\,327\) \(9\,159\) \(13\,486\)

19.6 Subjective approach

Many probabilities cannot be computed using the classical or relative frequency approach; for example, what is the probability that California will experience a Category 1 cyclone next year? In this case, only a subjective probability can be given.

'Subjective' probabilities may be based on personal judgement or experience. They can also be given by considering all the relevant issues that may impact the probability (and may, for example, be based on mathematical models that incorporate information from numerous inputs). Depending on how these other issues are considered and combined, different subjective probabilities may be given.

Weather forecasts are one example: they incorporate data from sea surface temperatures, local topography, air pressures, air temperatures and so on. Different models use different inputs, and may combine these inputs differently to produce different (subjective) forecast probabilities.

Definition 19.8 (Subjective approach to probability) In the subjective approach to probability, various factors are incorporated, perhaps subjectively, to determine the probability of an event occurring.

Example 19.11 (Subjective probability) During El Niño events, eastern Australia typically experiences drier-than-average winters and springs. The Australian Broadcasting Corporation's news website reported that the Australian Bureau of Meteorology predicted a \(50\)% probability of an El Niño event in 2023, while the American National Oceanic and Atmospheric Administration predicted a \(90\)% chance of an El Niño event in 2023.

Despite this, '[both] agencies are looking at the same part of the Pacific Ocean' to make their predictions. However, 'the US and Australia base their probability on different criteria'; these are subjective probabilities.

19.7 Independence

One important concept in probability is independence. Two events are independent if the probability of one event happening is the same, whether or not the other event has happened. For example, if you toss a coin, the probability of getting a head is the same whether you are sitting or standing. That is, the result of a coin toss is independent of your position.

Definition 19.9 (Independence) Two events are independent if the probability of one event is the same, whether or not the other event has happened.

Example 19.12 (Independence) Consider drawing two cards from a fair pack (of \(52\) cards), without returning the first card. For the first card, the sample space contains every card in the pack, and drawing any one card is as equally likely as drawing any other. Since four cards are Aces, the probability of drawing an Ace on the first draw is \(4/52\) (using the classical approach).

If we drew an Ace for the first card, the probability of drawing an Ace for the second card is \(3/51\) (three Aces remain among the \(51\) remaining cards). Alternatively, if we don't draw an Ace for the first card, the probability of drawing an Ace second time is \(4/51\) (four Aces remain among the \(51\) remaining cards).

That is, the probability of drawing an Ace for the second card depends on whether or not an Ace was drawn for the first card. The two events 'Drawing an Ace for the first card' and 'Drawing an Ace for the second card' are not independent events.

A 'standard' pack of cards has \(52\) cards, organised into four suits: spades, clubs (both black), hearts and diamonds (both red). Each suit has \(13\) denominations: \(2\), \(3\), \(4\), \(5\), \(6\), \(7\), \(8\), \(9\), \(10\), Jack (J), Queen (Q), King (K), Ace (A). The Ace, King, Queen and Jack are called picture cards. (Most packs also contain two jokers, which are not considered part of a standard pack.)

Example 19.13 (Wearing sunglasses) A research study conducted in Brisbane (B. Dexter et al. 2019) recorded the number of people at the foot of the Goodwill Bridge, Southbank, who wore sunglasses between \(11\):\(30\)am to \(12\):\(30\)pm (Table 19.2). Is the wearing of sunglasses independent of the person's gender?

If this was true, the probability of a female wearing sunglasses would be equal to the probability of a male wearing sunglasses. From Table 19.2:

  • The probability of a female wearing sunglasses: \(126\div 366 = 0.344\), and
  • The probability of a male wearing sunglasses: \(123\div 386 = 0.319\).

These probabilities are close, but not exactly equal. In the sample, wearing sunglasses is close to, but not exactly, independent. Since the data are found from a single sample, taken at on one day at one location, so we cannot be sure about any conclusions more generally.

TABLE 19.2: Females and males wearing sunglasses
Female Male
No \(240\) \(263\)
Yes \(126\) \(123\)

19.8 Chapter summary

Three ways to compute probabilities are:

  • the classical approach, which requires all outcomes to be equally likely;
  • the relative frequency approach; and
  • the subjective approach.

Two events are independent if the probability of one event is the same, whether or not other event has happened.

19.9 Quick review questions

Suppose Event \(A\) is defined as 'Rolling a or a on a fair die'. Also, suppose Event \(B\) is defined as 'Rolling an even number on a die'.

  1. What is the best approach to computing the probability of Event \(A\) occurring?
  2. What is the probability of Event \(A\) occurring?
  3. True or false: Events \(A\) and \(B\) are independent.
  4. What is the probability of '\(A\) and \(B\)' occurring?
  5. What is the probability of '\(A\) or \(B\)' occurring?
  6. What is the probability of 'not \(B\)' occurring?

19.10 Exercises

Selected answers are available in App. E.

Exercise 19.1 Which approach is best used to estimate a probability in these situations?

  1. The probability that your sporting team wins on the weekend.
  2. The probability that a randomly-chosen person writes left-handed.

Exercise 19.2 Which approach is best used to estimate a probability in these situations?

  1. The probability that a King will be chosen from a pack of cards?
  2. The probability that Paris receives more than \(50\) mm of rain next May.

Exercise 19.3 Consider these three events about tossing a fair coin, then answer the questions that follow: Event 1 is 'toss a Head on the first toss'; Event 2 is 'toss a Tail on the first toss'; and Event 3 is 'toss a Head on the second toss'.

  1. Are Event 1 and Event 2 independent events?
  2. Are Event 1 and Event 3 independent events?
  3. Compute the probability of Event 3.

Exercise 19.4 Consider rolling a fair die. Event A is 'rolling a an even number', Event B is rolling an odd number` and Event C is rolling a .

  1. What events are in '\(A\) and \(B\)'?
  2. Compute the probability of '\(A\) and \(B\)'.
  3. What events are in '\(A\) or \(B\)'?
  4. Compute the probability of '\(A\) or \(B\)'.
  5. What events are in '\(A\) and \(C\)'?
  6. Compute the probability of '\(A\) and \(C\)'.
  7. What events are in 'not \(C\)'?
  8. Compute the probability of 'not \(C\)'.

Exercise 19.5 Suppose I roll a standard six-sided die.

  1. What is the probability that I will roll a number larger than ?

  2. What are the odds of rolling a number smaller than ?

  3. Suppose I toss a coin after rolling the die. Is the result from the coin toss independent of what I rolled on the die?

  4. What is the probability that I roll a number divisible by \(2\) on the die?

  5. What is the probability that I roll a number divisible by \(2\) and divisible by \(3\) on the die?

Exercise 19.6 Suppose you have a well-shuffled, standard pack of \(52\) cards.

  1. What is the probability that you will draw a King?
  2. What are the odds that you will draw a King?
  3. What is the probability that you will draw a picture card (Ace, King, Queen or Jack)?
  4. What are the odds that you will draw a picture card (Ace, King, Queen or Jack)?
  5. Suppose I draw two cards from the pack. Are the events 'Draw a King first' and 'Draw a Queen second' independent events?
  6. Suppose I draw one card from the pack and roll a six-sided die. Are the events 'Draw a Jack from the pack of cards' and 'Roll a on the die' independent events?

Exercise 19.7 Table 19.3 tabulates information about school children in Queensland in 2019 (P. K. Dunn 2023).

  1. What is the probability that a randomly chosen student is a first-nations student?
  2. What is the probability that a randomly chosen student is in a government school?
  3. Is the sex of the student approximately independent of whether the student is a first nations student, for students in government schools?
  4. Is the sex of the student approximately independent of whether the student is a first nations student, for students in non-government schools?
  5. Is the sex of the student approximately independent of the type of school, for female students?
  6. Is the sex of the student approximately independent of the type of school, for male students?
  7. Based on the above, what can you conclude from the data?
TABLE 19.3: The number of first nation and non-first nation students in various Queensland schools in 2019
Number of first nations students Number of non-first nations students
Government schools
Females \(2\,540\) \(21\,219\)
Males \(2\,734\) \(22\,574\)
Non-government schools
Females \(\phantom{0}391\) \(\phantom{0}9\,496\)
Males \(\phantom{0}362\) \(\phantom{0}9\,963\)

Exercise 19.8 Are these pairs of events likely to be independent or not independent? Explain.

  1. 'I walk to work tomorrow morning', and 'Rain is expected tomorrow morning'.
  2. 'A person smokes more than \(10\) cigarettes per week' and 'A person gets lung cancer'.
  3. 'It rains today' and 'I hose my garden today'.

Exercise 19.9 In disease testing, two keys aspects of the test are:

  • Sensitivity: the probability of a positive test result among those with the disease; and
  • Specificity: the probability of a negative test result among those without the disease.

Both are important for understanding how well a test works in practice.

A certain test has a sensitivity of \(0.99\) and a specificity of \(0.98\). Consider a group of \(1000\) people, \(100\) of whom have the disease and \(900\) who do not have the disease. All the people are given the test.

  1. Suppose the \(100\) people who do have a disease are tested. How many would be expected to return a positive test result?
  2. Suppose the \(900\) people who do not have a disease are tested. How many would be expected to return a positive test result?
  3. In total, how many positive tests would be expected from the \(1000\) people?
  4. Consider those people who return a positive test result. What is the probability that one of these people actually has the disease?

Exercise 19.10 Explain why the following argument is incorrect:

When I toss two coins, there are only three outcomes: a Head and a Head, a Tail and a Tail, or one of each. So the probability of obtaining two Tails must be one-third.

Exercise 19.11 On October 13, the American television programme Nightline interviewed Dr Richard Andrews, director of the California Office of Emergency Services, to discussed various natural disasters that were being predicted. In the interview, Dr Andrews said:

I listen to earth scientists talk about earthquake probabilities a lot and in my mind every probability is \(50\)--\(50\), either it will happen or it won't happen...

Explain why Dr Andrews is incorrect when he says that "every probability is \(50\)--\(50\)". Give an example to show why he must be incorrect. (Based on a report in Chance News 6.12.)

Exercise 19.12 2-way TABLE BASED EXAMPLE