# Chapter 10 Probabilistic thinking

*Probability*— of our textbook, which we also examine in the next section.

**The reading below is required,**Whitlock and Schluter (2020) is not.

Motivating scenarios: We want to understand how seemingly important patterns can arise by chance, how to think about chance events, how to think quantitatively about chance events, and gain an appreciation for a foundational concern of statistics and how the filed addresses this concern.

**Learning goals: By the end of this chapter you should be able to**

- Provide a working definition of probability.

- Explain why we need to think about probability when interpreting data.

- Consider a proportion as an estimate as random outcome of a population parameter.

- Recognize that “Random” does not mean “No clue”.

- Understand conditional probabilities and non-independence.

**listen to 5 minutes (4:10 - 9:10) of the**

*No coincidence, No story*

**Episode of This American Life, embedded below**.

## 10.1 Why do we care about probability?

So, our (and many) statistics textbook(s) has a chapter on probability, before it gets in to much statistics. Why is that, and why am I keeping this chapter here? For me there are three important and inter-related reasons that we should think about probability before delving too deep into statistics.

**Many important biological processes are influenced by chance.**In my own field of genetics, we know that mutation, recombination and which grandparental allele you inherit are all chance events. If you get a transmissible disease after modest exposure is a chance event. Even apparently highly coordinated and predictable gene expression patterns can arise from many chance events. It is therefore critical to get an intuition for how chance (random) events happen. It is fun to philosophically consider if these things are really chance, or if they were all predestined, or if we could make perfect predictions with more information etc. etc. etc. While this is fun, it is not particularly useful – whether or not something is predestined, we can only use the best information we have to make probabilistic statements.**We don’t want to tell biological stories about coincidences.**A key lesson from probability theory is that some outcome will happen. So, for every coincidence in the podcast above, there are tens of thousands of cases where something uninteresting happened. For example, no one called This American Life to tell them that somehow their grandmother was not in a random picture their friend sent them (*No coincidence, no story*). Similarly, we don’t want to tell everyone that covid restores lactose tolerance just because someone in our extended social network lost his lactose intolerance after recovering from covid. Much of statistics is simply a mathematical contraption, built of probability theory, to help us judge if our observations are significant or if they are a mere coincidence.**Understanding probability helps us interpret statistical results**Confidence intervals, the sampling distribution, p-values, and a host of other statistical concepts are notoriously confusing. Some of this is because they are strange and confusing topics, but a large contributor is that people come across these terms without a foundation in probability. In addition to helping us understand statistical concepts, probabilistic thinking can help us understand many seemingly magical or concerning observations in the scientific literature. For example, later in this class, we will see that attempts to replicate exciting experimental results often fail, and that the effect of a treatment tends to get smaller the longer we study it. Amazingly, understanding a little about probability and a little about how scientists work can explain many of these observations.

Luckily, none of his requires serious expertise in probability theory – while I recommend taking a bunch of probability theory courses (because it’s soooo fun), a quick primer, like we’re working through here is quite powerful!

**Perhaps the most important take home message from all of this is that random does not mean equal probability, and does not mean we have no clue. In fact, we can often make very good predictions about the world by considering how probability works. **

## 10.2 Probability Concepts

### 10.2.1 Sample space

When we think about probability, we first lay out all potential outcomes (formally known as **sample space**).

The most straightforward and common example of sample space is the outcome of a coin flip. Here the states are

- Coin lands heads side up
- Coin lands tails side up

However, probabilistic events can have more than two potential outcomes, and need not be 50:50 affairs.

Consider a single ball dropping in Figure 10.1, the **sample space** is:

- The ball can fall through the orange bin (A).

- The ball can fall through the green bin (B).

- The ball can fall through the blue bin (C).

*This view of probability sets up two simple truths*

- After laying out sample space, some outcome is going to happen.

- Therefore, adding up (or integrating) the probabilities of all outcomes in sample space will sum to one.

Point (1) helps us interpret coincidences – something had to happen (e.g. the person in the background of a photo had to be someone…) and we pay more attention when that outcome has special meaning (e.g. that someone was my grandmother).

### 10.2.2 Probabilities and proportions

#### Probabilities

A **probability describes a parameter for a population.** So in our example if we watch the balls drop forever and counted the proportion we would have the parameter. Or assuming that balls fall with equal probability across the line (they do) we can estimate the probabilities by the proportion of space each bar takes up in Figure 10.1. We see that A takes up half of sample space, that B takes up 1/6 or sample space, and that Ctakes up about 2/6 of sample space. So the probability of a given event P(event):

- P(A) = 3/6.

- P(B) = 1/6.

- P(C) = 2/6.

#### Proportions

**The proportion** of each outcome is the number of times we see that outcome divided by the number of times we seen any outcome – that is, the **proportion describes an estimate from a sample**. Let’s look at a snapshot of a single point in time from Figure 10.1:

We count 15 balls - 10 orange (A), 2 green (B), and 3 blue (C ).

Here, the proportion of:

- A = 10/15 = 2/3 = \(0.66\overline{66}\).

- B = 2/15 = \(0.13\overline{33}\).

- C = 3/15 = 1/5 = \(0.20\).

Note that these proportions (which are estimates from a sample) differ from the probabilities (the true population parameters) because of sampling error.

### 10.2.3 Exclusive vs non-exclusive events

A flipped coin will land heads up or tails up – it cannot land on both sides at once. Outcomes that cannot occur at the same time are **mutually exclusive**.

A coin we find on the ground could be heads up or heads down (**mutually exclusive**), and could be a penny or nickel or dime or quarter etc (also **mutually exclusive**).

But a coin can be both heads up and a quarter. These are nonexclusive events. **Non-exclusive** events are cases were the occurrence of one outcome does not exclude the occurrence of another.

#### Exclusive events

Keeping with our con flipping idea, let’s use the app from seeing theory, below, to explore the outcomes of one or more exclusive events.

- Change the underlying probabilities away from 50:50 (so the coin toss is unfair).

- Then toss the coin once.
- Then toss the coin again, and repeat this four times.

- Finally toss it one hundred times.

**Be sure to note the probabilities you chose, the outcomes of the first five tosses, and the proportion of heads and tails over all coin flips.**

#### Non-Exclusive events

In Figure 10.1, the balls falling through A, B, and C were mutually exclusive – a ball could not fall through both A and B, so all options where mutually exclusive.

This need not be true. The outcomes could be non-exclusive (e.g. if the bars were arranged such that a ball could fall on A and then B).

Figure 10.3 shows a case when falling through A and B are not exclusive.

### 10.2.4 Conditional probabilities and (non-)independence

Conditional probabilitiesallow us to account for information we have about our system of interest. For example, we might expect the probability that it will rain tomorrow (in general) to be smaller than the probability it will rain tomorrow given that it is cloudy today. This latter probability is a conditional probability, since it accounts for relevant information that we possess.Mathematically, computing a conditional probability amounts to shrinking our sample space to a particular event. So in our rain example, instead of looking at how often it rains on any day in general, we “pretend” that our sample space consists of only those days for which the previous day was cloudy. We then determine how many of those days were rainy.

In mathematical notation, | symbol denotes conditional probability. So, for example, \(P(A|C)=0\) means that if we know we have outcome \(C\) there is no chance that we also had outcome \(A\) (i.e. \(A\) and \(C\) are mutually exclusive.

For some terrible reason

`|`

means *or* to R

`|`

means “conditional on” in stats.

Stay safe.

#### Independent events

Events are **independent** if the the occurrence of one event provides no information about the other. Figure 10.4 displays a situation identical to 10.4 - but also shows conditional probabilities. In this example, the probability of A is the same whether or not the ball goes through B (and vice versa).

$### Non-independence {-}

Events are **non-independent** if their conditional probabilities differ from their unconditional probabilities. We’ve already seen an example of non-independence. If events *a* and *b* are mutually exclusive, \(P(A|B) = P(B|A) = 0\), while \(P(A|B) \neq P(A)\) and \(P(B|A) \neq P(B)\).

**Independence vs non-independence is very important for both basic biology and its applications.** For example, say we’re all getting a vaccine and some people get sick with some other disease. If these where independent – i.e. the proportion of sick people was roughly similar in the vaccinated and the unvaccinated group, we would probably feel ok about mass vaccination. On the other hand, if we saw an excess of sickness in the vaccine group, we would feel less good about it. That said, non-independence does not imply a causal link (e.g. the vaccinated people might socialize more, exposing themselves to the other disease, so we would want to see if we could disentangle correlation and causation before joining the local anti-vax group).

Figure 10.5 shows an example of non-independence:

The probability of A, P(A) = \(\frac{1}{3}\).

The probability of A conditional on B, P(A|B) = \(\frac{1}{4}\).

The probability of B, P(B) = \(\frac{2}{3}\).

The probability of B conditional on A, P(B|A) = \(\frac{1}{2}\).

## 10.3 Probability Rules

I highly recommend a formal probability course would go through a bunch of ways to calculate probabilities of alternative outcomes (probability theory was my favorite class). But you don’t need all that here. We simply want to get an intuition for probability, which we can achieve by familiarizing ourselves with some straightforward rules.

Internalizing the rough rules below will get you pretty far:

**Add**probabilities to find the probability of one outcome**OR**another. (be sure to subtract off double counting).**Multiply**probabilities to find the probability of one outcome**AND**another. (be sure to consider conditional probabilities if outcomes are non-independent.)

OK – Let’s work through these rules.

*a*and

*b*. These are generic events that can stand in for any random thing, so \(P(a)\) is the probability of some event

*a*. So if

*a*was red head, p(

*a*) is the probability that a person is a red head. I distinguish this from A, B, and C, which refer to balls dropping through colored bars in our visualizations.

### 10.3.1 Add probabilities for **OR** statements

To find the probability or this or that we sum all the ways we can be this or that, being careful to avoid double-counting.

\[\begin{equation} \begin{split} P(a\text{ or }b) &= P(a) + P(b) - P(ab)\\ \end{split} \tag{10.1} \end{equation}\]

where *P(ab)* is the probability of outcomes *a* and *b*.

**Special case of exclusive outcomes**

**If outcomes are mutually exclusive, the probability of outcome of both of them,** *p(ab)* **is zero, so the probability of** *a* **or** *b* **is**

\[\begin{equation} \begin{split}P(a\text{ or }b | P(ab)=0) &= P(a) + P(b) - P(ab)\\ &= P(a) + P(b) - 0\\ &= P(a) + P(b) \end{split} \end{equation}\]

Returning to the example in 10.1 [presented again below]

In which

- P(A) = 3/6.

- P(B) = 1/6.

- P(C) = 2/6.

**The probability the ball goes through A OR B is**

3/6 **+** 1/6 = 4/6 = 2/3.

We often use this rule to find the probability of a range of outcomes for discrete variables (e.g. the probability that the sum of two dice rolls is between six and eight = P(6) + P(7) + P(8)).

**General case**

**More generally, the probability of outcome a or b is P(a) + P(b) - P(a b)** [Equation (10.1)]. This subtraction (which is irrelevant for exclusive events, because in that case \(P(ab) = 0\)) avoids double counting.

**Why do we subtract P(a and b) from P(a) + P(b) to find P(a or b)?** Figure 10.6 shows that this subtraction prevents double-counting – the sum of falling through A or B is \(0.50 + 0.75 = 1.25\). Since probabilities cannot exceed one, we know this is foolish – but we must subtract to avoid double counting even when the sum of probabilities do not exceed one.

Following equation (10.1) and estimating that about 35% of balls falls through A and B, we find that

P(A or B) = P(A) + P(B) - P(AB)

P(A or B) = 0.50 + 0.75 - 0.35

P(A or B) = 0.90.

For a simple and concrete example, the probability a random student played soccer or ran track in high school, is the probability they played soccer *plus* the probability they ran track *minus* the probability they played soccer and ran track.

#### The probability of not

My favorite thing about probabilities is that they have to sum (or integrate) to one. This helps us be sure we laid out sample space right, and allows us to check our math. It also provides us with nice math tricks.

Because probabilities sum to one, the **probability of “not a” equals 1 - the probability of a**. This simple trick often makes hard problems a lot easier.

### 10.3.2 Multiply probabilities for **AND** statements

The probability of two types of events in an outcome e.g. the probability of *a* and *b* equals the probability of *a* times the probability of *b* conditional on *a*, which will also equal the probability of *b* times the probability of *a* conditional on *b*.

\[\begin{equation} \begin{split} P(ab) &= P(a) \times P(b|a)\\ & = P(b) \times P(a|b) \end{split} \tag{10.2} \end{equation}\]

#### Special cases of AND

**If two outcomes are mutually exclusive**, the probability of one conditional on the other is zero (\(P(b|a) = 0\)). So the probability of one given the other is zero (\(P(a) \times P(b|a) = P(a) \times 0 = 0\)).**If two outcomes are independent**, the probability of one conditional on the other is simply its overall probability (\(P(b|a) = P(b)\)). So the probability of one given the other is the product of their overall probabilities zero (\(P(a) \times P(b|a) = P(a) \times P(b)\)).

The Punnett square is a classic example of independent probabilities in genetics as the sperm genotype usually does not impact meiosis in eggs (but see my paper (Brandvain and Coop 2015) about exceptions and the consequences for fun). If we cross two heterozygotes, each transmits their recessive allele with probability \(1/2\), so the probability of being homozygous for the recessive allele when both parents are heterozygous is \(\frac{1}{2}\times \frac{1}{2} = \frac{1}{4}\).

#### The general cases of AND

The outcome of meiosis does not depend on who you mate with, but in much of the world context changes probabilities. When outcomes are nonexclusive, we find the probability of both *a* and *b* by multiplying the probability of a times the probability of *b* conditional on observing *a*, \(P(ab) =P(a) \times P(b|a)\) [Eq. (10.2)].

**CONDITIONAL PROBABILITY VISUAL:**

Let’s work through a visual example!

In the figure above

P(A) = 1/3,

P(B) = 2/3,

P(A|B) = 1/4, and

P(B|A) = 1/2.

Using Equation (10.2), we find the probability of A and B:

P(AB) = P(A) \(\times\) P(B|A) = P(A) = 1/3 \(\times\) 1/2 = 1/6.

Which is the same as

P(AB) = P(B) \(\times\) P(A| B) = 2/3 \(\times\) 1/4 = 2/12= 1/6.

We can see this visually in Figure 10.8:

**CONDITIONAL PROBABILITY REAL WORLD EXAMPLE:**

You are way less likely to get the flu if you get the flu vaccine.

So, what is the probability that a random person got a flu vaccine and got the flu?

First, we lay out some rough probabilities:

Every year, a little more than half of the population gets a flu vaccine, let us say P(No Vaccine) = 0.58.

About 30% of people who do not get the vaccine get the flu, P(Flu|No Vaccine) = 0.30.

The probability of getting the flu diminishes by about one-fourth among the vaccinated, P(Flu|Vaccine) = 0.075.

But we seem stuck, what is \(P(\text{Vaccine})\)? We can find this as \(1 - P(\text{no vaccine}) = 1 - 0.58 = 0.42\).

Now we find P(Flu and Vaccine) as P(Vaccine) x P(Flu | Vaccine) = 0.42 x 0.075 = 0.0315.

### 10.3.3 The law of total probability

Let’s say we wanted to know the probability that someone caught the flu. How do we find this?

We learned in Chapter 10 that the probability of all outcomes in sample space has to sum to one. Likewise, the probability of all ways to get some outcome have to sum to the probability of getting that outcome:

\[\begin{equation} P(b) = \sum P(a_i) \times P(b|a_i) \tag{10.3} \end{equation}\]

The \(\sum\) sign notes that we are going over all possible outcomes of \(a\), and the subscript, \(_i\), indexes these potential outcomes of \(a\). In plain language, **we find probability of some outcome** by

- Writing down all the ways we can get it,

- Finding the probability of each of these ways,

- Adding them all up.

So for our flu example, the probability of having the flu is the probability of being vaccinated and having the flu, plus the probability of not being vaccinated and catching the flu (NOTE: We don’t subtract anything off because they are mutually exclusive.). We can find each following the general multiplication principle (10.2):

\[\begin{equation} \begin{split} P(\text{Flu}) &= P(\text{Vaccine}) \times P(\text{Flu}|\text{Vaccine})\\ &\text{ }+ P(\text{No Vaccine}) \times P(\text{Flu}|\text{No Vaccine}) \\ &= 0.42 \times 0.075 + 0.58 \times 0.30\\ &= 0.0315 + 0.174\\ &= 0.2055 \end{split} \end{equation}\]

### 10.3.4 Probability trees

It can be hard to keep track of all of this. Probability trees are a tool we use to help. To make a probability tree, we

- Write down all potential outcomes for
*a*, and use lines to show their probabilities.

- We then write down all potential outcomes for
*b*separately for each potential a, and connect each a to these outcomes with a line denoting the conditional probabilities.

- Keep going for events
*c*,*d*, etc…

- Multiply each value on path to get the probability of that path (The general multiplication rule, Equation (10.2)).

- Add up all paths that lead to the outcome you are interested in. (NOTE: Again, reach path is exclusive, so we don’t subtract anything.)

Here is a probability tree I drew for the flu example. Reassuringly the probability of getting the flu matches our math above.

### 10.3.5 Flipping conditional probabilities

The probability that a vaccinated person gets the flu, P(Flu|Vaccine) is 0.075.

But what is the probability that someone who has the flu got vaccinated, P(Vaccine|Flu)?

**The way to think about this is to find the number of people who are vaccinated AND get the flu, and divide by the number of people who get the flu.** Turning these into probabilities, rather than counts), and applying our probability rules, provides us with a mathy way to do this. That is, The probability that someone who has the flu got vaccinated equals the probability that someone got the flu and was vaccinated, divided by the probability that someone has the flu.

\[\begin{equation} \begin{split} P(\text{Vaccine|Flu}) &= P(\text{Vaccine and Flu}) / P(\text{Flu}) \\ P(\text{Vaccine|Flu}) &= \tfrac{P(\text{Flu|Vaccine}) \times P(\text{Vaccine})}{P(\text{Vaccine}) \times P(\text{Flu}|\text{Vaccine}) + P(\text{No Vaccine}) \times P(\text{Flu}|\text{No Vaccine})}\\ P(\text{Vaccine|Flu}) &= 0.0315 / 0.2055 \\ P(\text{Vaccine|Flu}) &=0.1533 \end{split} \end{equation}\]

So, while the vaccinated make up 42% of the population, they only make up 15% of people who got the flu.

The numerator on the right hand side, \(P(\text{Vaccine and Flu})\), comes from the general multiplication rule (Eq. (10.2)).

The denominator on the right hand side, \(P(\text{Flu})\), comes from the law of total probability (Eq. (10.3)).

This is an application of **Bayes’ theorem**. Bayes’ theorem is a math trick which allows us to flip conditional probabilities as follows

\[\begin{equation} \begin{split} P(A|B) &= \frac{P(A\text{ and }B)}{P(B)}\\ P(A|B) &= \frac{P(B|A) \times P(A)}{P(B)} \end{split} \tag{10.4} \end{equation}\]

## 10.4 Probability rules and math: Quiz and Summary

Go through all “Topics” in the `learnR`

tutorial, below. Nearly identical will be homework on canvas.

## Probabilistic thinking: Critical definitions

**Sample space:** All the potential outcomes of a random trial.

**Probability:** The proportion of events with a given outcome if the random trial was repeated many times,

**Mutually exclusive:** If one outcome excludes the others, they are mutually exclusive.

**Conditional probability:** The probability of one outcome, if we know that some other outcome occurred.

**Independent:** When one outcome provides no information about another, they are independent.

**Non-Independent:**When knowing one outcome changes the probability of another, they are non-independent.

### References

*Evolution*69 (4): 1004–14. https://doi.org/https://doi.org/10.1111/evo.12621.

*The Analysis of Biological Data*. Third Edition.