16 Probability
So far, you have learnt to ask a RQ, identify different ways of obtaining data, design the study, collect the data describe the data, summarise data graphically and numerically, and understand the decision-making process.
In this chapter, you will learn about probability to describe the random nature of sample statistics. You will learn to:
- explain probabilities.
- apply the classical approach to probability in simple situations.
- apply the relative frequency approach to probability in simple situations.
- apply the subjective approach to probability in simple situations.
- identify events that are independent.
16.1 Introduction
In this chapter, probability is discussed. In short, probability quantifies the chance that something (an 'event') will happen in the future.
More formally, probability is discussed in the context of a procedure whose result is unknown beforehand. A list of all possible results from this procedure is called the sample space. An event is then defined as any combination of these elements of the sample space.
Definition 16.1 (Sample space) The sample space is a list of all possible and distinct results after administering a procedure, whose result is unknown beforehand.
Definition 16.2 (Event) An event is any combination of the elements in the sample space.
Example 16.1 (Sample spaces and events) Consider rolling a fair, six-sided die. We do not know what face will be uppermost until we roll the die.
However, the sample space for this procedure can be listed: and These are all distinct results (no overlap), and the sample space is discrete.
Many events could be defined using this sample space; for example:
- Rolling a : This event includes one element of the sample space: .
- Rolling a even number: This event includes three elements of the sample space: and
- Rolling a number larger than : This event includes four elements of the sample space: and .
Example 16.2 (Sample spaces and events) Consider the distance which you can throw a cricket ball.
We do not know what distance your throw will be until you throw, but we can describe the sample space for this procedure: the distance could be anywhere between (say) 0 and 200 metres (and of course, some of those distances are fairly unlikely to occur...).
This sample space is continuous.
Many events could be defined using this sample space; for example:
- Throwing more than 50 metres: This event includes elements of the sample space greater than 50m.
- Throwing between 10 and 40 metres: This event includes elements of the sample space between 10m and 40m.
- Throwing less than 20 metres: This event includes elements of the sample space less than 20m.
Because the sample space is continuous, throwing an exact distance (such as exactly 10 metres) is technically not possible.
Once a sample space is defined, a probability can be defined.
Definition 16.3 (Probability) A probability is a number between 0 and 1 inclusive (or between 0% and 100% inclusive) that quantifies the likelihood that a certain event will occur.
A probability of zero (or 0%) means the event is 'impossible' (will never occur). At the other extreme, a probability of one (or 100%) means that the event is 'certain' to happen (will always occur). Most events have a probability between the extremes of 0% and 100%.
Example 16.3 (Probabilities) Consider these cases:
- The probability of receiving negative rainfall is zero: It is impossible.
- The probability of receiving some rain in Buderim in the next decade is one. It is certain.
- The probability of receiving rain on any given day... is somewhere between 0 and 1.
Different ways exist to calculate probabilities of events, including:
16.2 Classical approach
What is the probability of rolling a on a die? The sample space has six possible outcomes (listed in Example 16.1) that are equally likely, and the event ('rolling a ') comprises just one of those.
Thus, \[\begin{align*} \text{Prob. of rolling a four} &= \frac{\text{The number of results that are a 4}}{\text{The number of possible results}}\\ &= \frac{1}{6}. \end{align*}\]
We can say that 'the probability of rolling a is 1/6', or 'the probability of rolling a is 0.1667'. The answer can also be expressed as a percentage ('the probability of rolling a is 16.7%').
The answer could also be interpreted as 'the expected proportion of rolls that are a is 0.167'. This approach to computing probabilities is called the classical approach to probability.
The chance of rolling a in the future is 0.167, but a roll of the die will either produce a , or will not produce a ... and we don't know which will occur.
Example 16.4 (Describing probability outcomes) Consider rolling a standard six-sided die again.
- The probability of rolling an even number is \(3 \div 6 = 0.5\).
- The percentage of rolls that are expected to be even numbers is \(3 \div 6 \times 100 = 50\)%.
- The odds of rolling an even number is \(3\div 3 = 1\).
Definition 16.4 (Classical approach to probability) In the classical approach to probability, the probability of an event occurring is the number of elements of the sample space included in the event, divided by the total number of elements in the sample space, when all outcomes are equally likely.
By this definition:
\[ \text{Prob. of an event} = \frac{\text{Number of equally-likely outcomes in the event of interest}}{\text{Total number of equally-likely outcomes}} \]
Example 16.5 (Simple events) What is the probability of rolling a on a die? What are the odds of rolling a on a die?
Since the six possible outcomes in the sample space are equally likely:
\[ \text{Prob. of rolling a two} = \frac{\text{One outcome is a 2}}{\text{Six equally-likely outcomes}}. \] So the probability is \(\frac{1}{6} = 0.1667\), or about 16.7%. Also, since the six possible outcomes are equally likely:
\[ \text{Odds of rolling a two} = \frac{\text{One outcomes is a two}}{\text{Five of the possible outcomes are not a two}}. \] So the odds of rolling a two is \(\frac{1}{5} = 0.2\).
Example 16.6 (More complicated events) Consider rolling a standard six-sided die.
There are six equally likely outcomes (Example 16.1) each with probability \(1/6\) (or 16.7%) of occurring. The probability of rolling a or a is \(2/6\) (or 33.3%).
Probabilities describe the likelihood that an event will occur before the outcome is known.
Odds and proportions can be used either before or after the outcome is known, provided the wording is correct. For example:
- Proportions describe how often an event has occurred after the outcome is known.
- Expected proportions describe the likelihood that an event will occur before the outcome is known.
The following example may help also.
Example 16.7 (Probabilities, proportions and odds) Before a fair coin is tossed:
- The probability of throwing a head is \(1/2 = 0.5\).
- The expected proportion of heads in many coin tosses is 0.5.
- The odds of throwing a head is \(1/1 = 1\).
If we have already tossed a coin 100 times and found 47 heads:
- The proportion of heads is \(47/100 = 0.47\).
- The odds that we threw a head is \(47/53 = 0.887\).
It makes no sense to talk about the 'probability that we just threw a head', because the event has already occurred.
16.3 Relative frequency approach
What is the probability that a new baby will be a boy? The sample space could be listed as: boy and non-boy. The classical approach could be used, since the sample space has two elements: \(1\div2 = 0.5\). This approach is fine if boys and non-boys are equally likely to be born. But are they?
In Australia in 2015, 305 377 live births occurred, with 157 088 male births and 148 289 non-male births. Then, the proportion of boys born in 2015 is
\[ \frac{157\,088}{305\,377} = 0.514, \] or about 51.4%. An estimate of the probability that the next birth will be a boy is is about 0.514 (or 51.4%). This is the relative frequency approach to calculating probabilities: based on past proportions.
The probability that the next birth will be a boy is approximately 0.514, but the next birth will either be a boy, or will be a not-boy... and we don't know which will occur.
Definition 16.5 (Relative frequency approach to probability) In the relative frequency approach to probability, the probability of an event is (approximately) the number of times the outcomes of interest has appeared in the past, divided by the number of 'attempts' in the past.
Example 16.8 (Relative frequency probability) Based on this information, the odds that a new baby will be a boy is approximately \(0.514\div (1-0.514) = 1.058\).
According to the ABS:
The sex ratio for all births registered in Australia generally fluctuates around 105.5 male births per 100 female births.
This is close to the odds of 1.058 found above.
The data in Table 16.1 concern students enrolling in a library introductory session in O-Week. (SSE is the School of Science and Engineering; SHSS is the School of Health and Sport Science.)
Find the probability that a randomly chosen student will be:
- An SSE student.
- An SHSS student aged Over 30.
- Over 30, if we already know the student is from SHSS.
(Answer is here^{361}.)
30 and under | Over 30 | Total | |
---|---|---|---|
SHSS | 56 | 40 | 96 |
SSE | 68 | 91 | 159 |
Total | 124 | 131 | 255 |
16.4 Subjective approach
Many probabilities cannot be computed using the classical or relative frequency approach; for example:
What is the probability that Queensland will experience a Category 1 cyclone next year?
In this case, only a subjective probability can be given.
'Subjective' probabilities are not necessarily 'made up'; it means the probability can be estimated by considering all the relevant issues that may impact the probability (and may, for example, be based on mathematical models that incorporate information from numerous inputs). Depending on how these other issues are considered and combined, different individuals may give different subjective probabilities.
Weather forecasts are one example: weather forecasts incorporate data from sea surface temperatures, topography, air pressures, air temperatures and so on. Different models use different inputs, and then may coombine these inputs differently to produce different (subjective) forecast probabilities.
Definition 16.6 (Subjective approach to probability) In the subjective approach to probability, various factors are incorporated, perhaps subjectively, to determine the probability of an event occurring.
Example 16.9 (Subjective probability) Many farmers, based on many years of experience, can give a subjective probability of the chance of receiving rainfall in the coming month.
Which approach is best used to estimate a probability in these situations?
- The probability that the Reserve Bank will drop interest rates next month.
- The probability that a randomly-chosen person writes left-handed.
- The probability that a King will be randomly chosen from a pack of cards?
- The probability that Buderim receives more than 100mm of rain next May.
(Answer is here^{362}.)
16.5 Independence
One important concept in probability is independence. Two events are independent if the probability of one event happening is the same, whether or not the other event has happened.
For example, if you toss a coin, the probability of getting a head is the same whether you are sitting or standing. That is, the result of a coin toss is independent of your position.
Definition 16.7 (Independence) Two events are independent if the probability of one event is the same, whether or not the other event has happened.
Example 16.10 (Independence) Consider drawing two cards from a fair pack (of 52 cards), without returning the first card.
For the first card, the sample space lists every card in the pack, and drawing any one card is as equally likely as drawing any other. Since four cards are Aces, the probability of drawing an Ace on the first draw is 4/52 (using the classical approach to probability).
If we drew an Ace for the first card, the probability of drawing an Ace for the second card is 3/51 (three Aces remain among the 51 remaining cards).
Alternatively, if we don't draw an Ace for the first card, the probability of drawing an Ace second time is 4/51 (four Aces remain among the 51 remaining cards).
In summary, the probability of drawing an Ace for the second card depends on whether or not an Ace was drawn for the first card. The two events 'Drawing an Ace for the first card' and 'Drawing an Ace for the second card' are not independent events.
16.6 Summary
At least three ways exist to compute simple probabilities: the classical approach, which requires all outcomes to be equally likely; the relative frequency approach; and the subjective approach. Two events are independent if the probability of one event is the same, whether or not other event has happened.
16.7 Quick review questions
- Suppose Event A is defined as 'I will roll a or a on a fair die'.
- What is the best way to compute the probability of Event A occurring?
- What is the probability of Event A occurring?
- Suppose Event B is defined as 'A randomly-chosen university student will like pizza'.
What is the best way to compute the probability of Event B occurring?
- True or false: Events A and Event B are independent.
- Consider these three events, then answer the questions that follow:
- Event 1 is 'The first card I pick from a standard 52-card pack will be an Ace';
- Event 2 is 'The second card I pick from a standard 52-card pack will be an Ace, if I do not return the first card'; and
- Event 3 is 'The second card I pick from a standard 52-card pack is an Ace, if I do return the first card to a random location'.
True or false: Event 1 and Event 2 are independent.
True or false: Event 1 and Event 3 are independent.
The probability of Event 2 is:
Progress:
16.8 Exercises
Selected answers are available in Sect. D.16.
Exercise 16.1 Suppose you have a well-shuffled, standard pack of 52 cards.
- What is the probability that you will draw a King?
- What are the odds that you will draw a King?
- What is the probability that you will draw a picture card (Ace, King, Queen or Jack)?
- What are the odds that you will draw a picture card (Ace, King, Queen or Jack)?
- Suppose I draw two cards from the pack. Are the events 'Draw a King first' and 'Draw a Queen second' independent events?
- Suppose I draw one card from the pack and roll a six-sided die. Are the events 'Draw a Jack from the pack of cards' and 'Roll a on the die' independent events?
Exercise 16.2 On October 13, the American television programme Nightline interviewed Dr Richard Andrews, director of the California Office of Emergency Services. They discussed various natural disasters that were being predicted as a result of an El Nino. In the interview, Dr Andrews said:
... we have to take these forecasts very seriously [...] I listen to earth scientists talk about earthquake probabilities a lot and in my mind every probability is 50--50, either it will happen or it won't happen...
Explain why Dr Andrews is incorrect when he says that "every probability is 50--50". Give an example to show why he must be incorrect. (Based on a report in Chance News 6.12.)
Exercise 16.3 The data in Table 16.2 were obtained from an investigation into aviation deaths of private pilots in Australia.^{363}
- What is the probability that a randomly chosen death in 1997 was of a pilot 50 or older?
- What proportion of deaths from 1997 to 1999 were of pilots aged under 30?
- What other information may be useful in studying the effect of age on pilot deaths?
1997 | 1998 | 1999 | |
---|---|---|---|
Under 30 | 2 | 1 | 3 |
30 to 49 | 5 | 12 | 5 |
50 or over | 9 | 11 | 9 |
Exercise 16.4 Are these pairs of two events likely to be independent or not independent? Explain.
- 'Whether or not I walk to work tomorrow morning', and 'Whether or not rain is expected tomorrow morning'.
- 'Whether or not a person smokes more than 10 cigarettes per week on average' and 'Whether or not a person get lung cancer'.
- 'Whether or not it rains today' and 'Whether or not my rubbish is collected today'.
Exercise 16.5 In disease testing, two keys aspects of the test are:
- Sensitivity: the probability of getting a positive* test result among people who do have the disease; and
- Specificity: the probability of getting a negative test result among people who do not have the disease.
Both are important for understanding how well a test works in practice. Consider a test with a sensitivity of 0.99 and a specificity of 0.98.
- Suppose 100 people who do have a disease are tested. How many would be expected to return a positive test result?
- Suppose 100 people who do not have a disease are tested. How many would be expected to return a positive test result?
Exercise 16.6 Consider the following argument:
When I toss two coins, there are only three outcomes: a Head and a Head, a Tail and a Tail, or one of each. So the probability of obtaining two Tails must be one-third.
The reasoning is incorrect. Explain why.
Exercise 16.7 Since my wife and I have been married, I have been called to jury service three times. The latest notice reads:
Your name has been selected at random from the electoral role...
In the same length of time, my wife has never been called to jury service.
Do you think the selection process really is 'at random'? Explain.