2.4 Probability spaces

In the previous sections we defined outcomes, events, and random variables, the main mathematical objects associated with a random phenomenon. But we haven’t actually computed any probabilities yet. As we saw in Section 1.3, there are some basic logical consistency requirements that probabilities must satisfy. These requirements are formalized in the following.

Definition 2.5 A probability space is a triple \((\Omega, \mathcal{F}, \textrm{P})\) where

  • \(\Omega\) is a sample space of outcomes
  • \(\mathcal{F}\) is a collection of events of interest17 \(A\subseteq\Omega\)
  • \(\textrm{P}\) is a probability measure which assigns a probability \(\textrm{P}(A)\) to events \(A\in\mathcal{F}\). A probability measure satisfies the following three axioms
    • \(\textrm{P}(\Omega)=1\)
    • For all events \(A\in\mathcal{F}\), \(0\le \textrm{P}(A)\le 1\)
    • (Countable additivity.) If events \(A_1, A_2, \ldots\in\mathcal{F}\) are disjoint (a.k.a. mutually exclusive) — that is \(A_i\cap A_j = \emptyset\) for all \(i\neq j\) — then \[\begin{equation*} \textrm{P}\left(\bigcup_{i=1}^\infty A_i\right) = \sum_{i=1}^\infty \textrm{P}\left(A_i\right) \end{equation*}\]

The requirement \(0\le \textrm{P}(A)\le 1\) makes sense in light of the relative frequency interpretation: an event \(A\) can not occur on more than 100% of repetitions or less than 0% of repetitions of the random phenomenon.

The requirement that \(\textrm{P}(\Omega)=1\) just ensures that the sample space accounts for all of the possible outcomes. If outcome \(\omega\) is observed, then event \(A\) occurs if \(\omega\in A\). If \(\textrm{P}(\Omega)<1\) then it would be possible to observe outcomes \(\omega\notin \Omega\); but this violates the requirement that \(\Omega\) is the set of all possible outcomes. Basically, \(\textrm{P}(\Omega)=1\) says that on any repetition of the random phenomenon, “something has to happen”. If \(\Omega\) is a countable set, countable addivity and \(\textrm{P}(\Omega)=1\) imply that probability of all the outcomes must add up to 1. For example, in Example 1.1 \(\textrm{P}(\Omega)=1\), together with countable additivity, is what requires that the probability that a team other than those four teams win to be 26%.

Countable additivity is best understood through a diagram with areas representing probabilities, as in the figure below which represents two events (yellow / and blue \). On the left, there is no “overlap” between areas so the total area is the sum of the two pieces; this depicts countable additivity for two disjoint events. On the right, there is overlap between the two areas, so simply adding the two areas “double counts” the intersection (green \(\times\)) and does not result in the correct total area. Countable addivity applies to any number of events, as long as there is no “overlap”.

Illustration of countable additivity for two events. The events in the picture on the left are disjoint, but not on the right.

Figure 2.4: Illustration of countable additivity for two events. The events in the picture on the left are disjoint, but not on the right.

In Example 1.1, the events \(A\)=“the Astros win the 2019 World Series” and \(D\)=“the Dodgers win the 2019 World Series” are disjoint \(A\cap D = \emptyset\); in a single World Series, both teams cannot win. Therefore, the probability of \(A\cup D\), the event that either the Astros or the Dodgers win, must be 46%.

The three axioms of a probability measure are minimal logical consistency requirements that must be satisfied by any probability model. There are also many physical aspects of the random phenomenon or assumptions (e.g. “fairness”, independence, conditional relationships) that must be considered when determining a reasonable probability measure for a particular situation. Often, \(\textrm{P}\) is defined implicitly through modeling assumptions, and probabilities of events follow from the axioms and related properties.

Many other properties follow from the axioms. The main “meat” of the axioms is countable additivity. Thus, the key to many proofs of probability properties is to write relevant events in terms of disjoint events.

Theorem 2.1 (Properties of a probability measure.) Complement rule18. For any event \(A\), \(\textrm{P}(A^c) = 1 - \textrm{P}(A)\). In particular, since \(\Omega^c=\emptyset\), \(\textrm{P}(\emptyset)=0\).

Subset rule19. If \(A \subseteq B\) then \(\textrm{P}(A) \le \textrm{P}(B)\).

General addition rule for two events20. If \(A\) and \(B\) are any two events \[\begin{align*} \textrm{P}(A\cup B) = \textrm{P}(A) + \textrm{P}(B) - \textrm{P}(A \cap B) \end{align*}\]

Law of total probability. If \(B_1,\ldots, B_k\) are disjoint with \(B_1\cup \cdots \cup B_k=\Omega\), then \[\begin{align*} \textrm{P}(A) & = \sum_{i=1}^k \textrm{P}(A \cap B_i) \end{align*}\]

The key to the proofs is to represent relevant events in terms of disjoint events and use countable addivity (and the other axioms).

Note: \(A\cup B\) is inclusive so we do want to count the possibility of both, \(A\cap B\). The problem with simply adding \(\textrm{P}(A)\) and \(\textrm{P}(B)\) is that their sum double counts \(A \cap B\). We do want to count the outcomes that satisfy both \(A\) and \(B\), but we only want to count them once. Subtracting \(\textrm{P}(A \cap B)\) in the general addition rule for two events corrects for the double counting.

For example, consider the picture on the right in Figure 2.4. Suppose each rectangular cell represents a distinct outcome; there are 16 outcomes in total. Assume the outcomes are equally likely, each with probability \(1/16\). Let \(A\) represent the yellow / event which has probability \(4/16\) and let \(B\) represent the blue \ event which has probability 4/16. Then \(\textrm{P}(A\cup B) = 6/16\), since there are 6 outcomes which satisfy either event \(A\) or \(B\) (or both). However, simply adding \(\textrm{P}(A)+\textrm{P}(B)\) yields \(8/16\) because the two outcomes that satisfy the green event \(A\cap B\) are counted both in \(\textrm{P}(A)\) and \(\textrm{P}(B)\). So to correct for this double counting, we subtract out \(\textrm{P}(A\cap B)\): \[ \textrm{P}(A)+\textrm{P}(B)-\textrm{P}(A\cap B) = 4/16 + 4/16 -2/16 = 6/16 = \textrm{P}(A\cup B) \]

Warning: The general addition rule for more than two events is more complicated21; see the inclusion-exclusion principle.

In the law of total probability22 the events \(B_1, \ldots, B_k\), which represent “cases”, form a partition of the sample space; each outcome \(\omega\in\Omega\) lies in exactly one of the \(B_i\). The law of total probability says that we can interpret the “overall” probability \(\textrm{P}(A)\) by summing the probability of \(A\) in each “case” \(\textrm{P}(A\cap B_i)\).

Exercise 2.1 Consider a Cal Poly student who frequently has blurry, bloodshot eyes, generally exhibits slow reaction time, always seems to have the munchies, and disappears at 4:20 each day. Which of the following events, \(A\) or \(B\), has a higher probability? (Assume the two probabilities are not equal.)

  • \(A\): The student has a GPA above 3.0.
  • \(B\): The student has a GPA above 3.0 and smokes marijuana regularly.

Warning! Your psychological judgment of probabilities is often inconsistent with the mathematical logic of probabilities.

Example 2.21 (Don’t do what Donny Don’t does.) At various points in his homework, Donny Don’t writes the following. Explain to Donny, both mathematically and intuitively, why each of the following symbols is nonsense. Below, \(A\) and \(B\) represent events, \(X\) and \(Y\) represent random variables.

  1. \(\textrm{P}(A = 0.5)\)
  2. \(\textrm{P}(A + B)\)
  3. \(\textrm{P}(A) \cup \textrm{P}(B)\)
  4. \(\textrm{P}(X)\)
  5. \(\textrm{P}(X = A)\)
  6. \(\textrm{P}(X \cap Y)\)
Solution to Example 2.21
  1. \(A\) is a set and 0.5 is a number; it doesn’t make mathematical sense to equate them. Suppose that \(A\) is the event that is rains tomorrow. It doesn’t make sense to say (as \(A=1\) does) “it rains tomorrow equals 0.5”. If we want to say “the probability that it rains tomorrow equals 0.5” we should write \(\textrm{P}(A) = 0.5\).
  2. \(A\) and \(B\) are sets; it doesn’t make mathematical sense to add them. Suppose that \(B\) is the event that tomorrow’s high temperature is above 80 degrees F. It doesn’t make sense to say (as \(A+B\) does) “the sum of (it rains tomorrow) and (tomorrow’s high temperature is above 80 degrees F)”. If we want “(it rains tomorrow) OR (tomorrow’s high temperature is above 80 degrees F)”, then we need \(A\cup B\). Union is an operation on sets; addition is an operation on numbers.
  3. \(\textrm{P}(A)\) and \(\textrm{P}(B)\) are numbers; union is an operation on sets, it doesn’t make mathematical sense to take a union of numbers. Since \(\textrm{P}(A)\) and \(\textrm{P}(B)\) are numbers then mathematically we can add them. But keep in mind that \(\textrm{P}(A)+\textrm{P}(B)\) is not necessarily a probability, for example if \(\textrm{P}(A)=0.5\) and \(\textrm{P}(B)=0.6\). If we want “the probability that (it rains tomorrow) OR (tomorrow’s high temperature is above 80 degrees F)” then the correct symbol is \(\textrm{P}(A\cup B)\), which would be equal to \(\textrm{P}(A)+\textrm{P}(B)\) only if \(A\) and \(B\) were disjoint (which in the example would mean it’s not possible to have a rainy day with a high temperature above 80 degrees F).
  4. \(X\) is a random variable, and probabilities are assigned to events. If \(X\) is tomorrow’s high temperature in degrees F then \(P(X)\) reads “the probability that tomorrow’s high temperature in degrees F”, which is missing any qualifying information that could define an event. We could write \(\textrm{P}(X>80)\) to represent “the probability that (tomorrow’s high temperature is greater than 80 degrees F)”.
  5. \(X\) is a random variable (a function) and \(A\) is an event (a set), and it doesn’t make sense to equate these two different mathematical objects. It doesn’t make sense to say (as \(X=A\) does) “tomorrow’s high temperature in degrees F equals the event that it rains tomorrow”.
  6. \(X\) and \(Y\) are RVs (functions) and intersection is an operation on sets. If \(Y\) is the amount of rainfall in inches tomorrow then \(X \cap Y\) is attempting to say “tomorrow’s high temperature in degrees F and the amount of rainfall in inches tomorrow”, but this is still missing qualifying information to define a valid event for which a probability can be assigned. We could say \(\textrm{P}(\{X > 80\} \cap \{Y < 2\})\) to represent “the probability that (tomorrow’s high temperature is greater than 80 degrees F) AND (the amount of rainfall tomorrow is less than 2 inches)”.

2.4.1 Probability measures in a dice rolling example

A probability measure is a set function: \(\textrm{P}:\mathcal{F}\mapsto[0, 1]\) takes as an input an event (set) \(A\in\mathcal{F}\) and returns as an output a number \(\textrm{P}(A)\in[0,1]\). Sometimes \(\textrm{P}(A)\) is defined explicitly for an event \(A\) via a formula.

As an example, consider a single roll of a four-sided die. The sample space consists of the four possible outcomes \(\Omega = \{1, 2, 3, 4\}\). Let’s first assume that the die is fair, so all four outcomes are equally likely, each with probability23 1/4. Given that the probability of each outcome24 is 1/4, countable additivity implies

\[ \textrm{P}(A) = \frac{\text{number of elements in $A$}}{4}, \qquad{\text{$\textrm{P}$ assumes a fair four-sided die}} \]

The following table lists all the possible events, and their probability under the assumption of equally likely outcomes.

Table 2.2: All possible events associated with a single roll of a four-sided die, and their probabilities assuming the die is fair.
Event Description Probability of event assuming equally likely outcomes
\(\emptyset\) Roll nothing (not possible) 0
\(\{1\}\) Roll a 1 1/4
\(\{2\}\) Roll a 2 1/4
\(\{3\}\) Roll a 3 1/4
\(\{4\}\) Roll a 4 1/4
\(\{1, 2\}\) Roll a 1 or a 2 2/4
\(\{1, 3\}\) Roll a 1 or a 3 2/4
\(\{1, 4\}\) Roll a 1 or a 4 2/4
\(\{2, 3\}\) Roll a 2 or a 3 2/4
\(\{2, 4\}\) Roll a 2 or a 4 2/4
\(\{3, 4\}\) Roll a 3 or a 4 2/4
\(\{1, 2, 3\}\) Roll a 1, 2, or 3 (a.k.a. do not roll a 4) 3/4
\(\{1, 2, 4\}\) Roll a 1, 2, or 4 (a.k.a. do not roll a 3) 3/4
\(\{1, 3, 4\}\) Roll a 1, 3, or 4 (a.k.a. do not roll a 2) 3/4
\(\{2, 3, 4\}\) Roll a 2, 3, or 4 (a.k.a. do not roll a 1) 3/4
\(\{1, 2, 3, 4\}\) Roll something 1

The above assignment satsifies all the axioms and so it represents a valid probability measure. But assuming that the outcomes are equally likely requires much more than the basic logical consistency requirements of the axioms. There are many other possible probability measures, like in the following.

Example 2.22 Now consider a single roll of a four-sided die, but suppose the die is weighted so that the outcomes are no longer equally likely (but each outcome is still possible). Suppose that the probability of event \(\{2, 3\}\) is 0.5, of event \(\{3, 4\}\) is 0.7, and of event \(\{1, 2, 3\}\) is 0.6. Complete a table, like the one for the equally likely outcome scenario above, listing the probability of each event for this particular weighted die. In what particular way is the die weighted? That is, what is the probability of each the four possible outcomes?

Solution to Example 2.22

Since the probability of not rolling a 4 is 0.6, the probability of rolling a 4 must be 0.4. Since \(\{3, 4\} = \{3\} \cup \{4\}\), a union of disjoint sets, the probability of rolling a 3 must be 0.3. Similarly, the probability of rolling a 2 must be 0.2, and the probability of rolling a 1 must be 0.1. The table below list the probabilities of the possible events for this particular weighted die.

Table 2.3: All possible events associated with a single roll of a fair-sided die, and their probabilities assuming the die is weighted: roll a 1 with probability 1/10, 2 with probability 2/10, 3 with probability 3/10, 4 with probability 4/10.
Event Description Probability of event assuming a particular weighted die
\(\emptyset\) Roll nothing (not possible) 0
\(\{1\}\) Roll a 1 0.1
\(\{2\}\) Roll a 2 0.2
\(\{3\}\) Roll a 3 0.3
\(\{4\}\) Roll a 4 0.4
\(\{1, 2\}\) Roll a 1 or a 2 0.3
\(\{1, 3\}\) Roll a 1 or a 3 0.4
\(\{1, 4\}\) Roll a 1 or a 4 0.5
\(\{2, 3\}\) Roll a 2 or a 3 0.5
\(\{2, 4\}\) Roll a 2 or a 4 0.6
\(\{3, 4\}\) Roll a 3 or a 4 0.7
\(\{1, 2, 3\}\) Roll a 1, 2, or 3 (a.k.a. do not roll a 4) 0.6
\(\{1, 2, 4\}\) Roll a 1, 2, or 4 (a.k.a. do not roll a 3) 0.7
\(\{1, 3, 4\}\) Roll a 1, 3, or 4 (a.k.a. do not roll a 2) 0.8
\(\{2, 3, 4\}\) Roll a 2, 3, or 4 (a.k.a. do not roll a 1) 0.9
\(\{1, 2, 3, 4\}\) Roll something 1

The symbol \(\textrm{P}\) is more than just shorthand for the word “probability”. \(\textrm{P}\) denotes the underlying probability measure, which represents all the assumptions about the probability model. Changing assumptions results in a change of the probability measure. We often consider several probability measures for the same sample space and collection of events; these several measures represent different sets of assumptions and different probability models.

In the dice example above, suppose \(\textrm{P}\) represents the probability measure corresponding to the assumption of a fair die (equally likely outcomes). With this measure \(\textrm{P}(A) = 2/4\) for \(A = \{1, 2\}\). Now let \(\textrm{Q}\) represent the probability measure corresponding to the assumption of the weighted die; then \(\textrm{Q}(A) = 0.3\). The outcomes and events are the same in both scenarios, because both scenarios involve a four sided-die. What is different is the probability measure that assigns probabilities to the events. One scenario assumes the die is fair while the other assumes the die has a particular weighting, resulting in two different probability measures.

Both probability measures in the dice example could be written as explicit set functions: for an event \(A\)

\[\begin{align*} \textrm{P}(A) & = \frac{\text{number of elements in $A$}}{4}, & & {\text{$\textrm{P}$ assumes a fair four-sided die}} \\ \textrm{Q}(A) & = \frac{\text{sum of elements in $A$}}{10}, & & {\text{$\textrm{Q}$ assumes a particular weighted four-sided die}} \end{align*}\]

We provide the above descriptions to illustrate that a probability measure operates on sets. However, in many situations there does not exist a simple closed form expression for the set function defining the probability measure which maps events to probabilities.

Perhaps the concept of multiple potential probability measures is easier to understand in a subjective probability situation. For example, each model that is used to forecast the 2019 NFL season corresponds to a probability measure which assigns probabilities to events like “the Eagles win the 2019 Superbowl”. Different sets of assumptions and models can assign different probabilities for the same events.

  • A single probability measure corresponds to a particular set of assumptions about the random phenomenon.
  • There can be many probability measures defined on a single sample space, each one corresponding to a different probabilistic model.
  • Probabilities of events can change if the probability measure changes.

2.4.2 Uniform probability measures #{sec-uniform-prob}

Sometimes \(\textrm{P}(A)\) is defined explicitly for an event \(A\) via a formula. For example, in the case of a finite sample space with equally likely outcomes,

\[ \textrm{P}(A) = \frac{|A|}{|\Omega|} = \frac{\text{number of outcomes in $A$}}{\text{number of outcomes in $\Omega$}} \qquad{\text{when outcomes are equally likely}} \]

Example 2.23 Flip a coin 4 times, and record the result of each trial in sequence. For example, HTTH means heads on the first on last trial and tails on the second and third. The sample space consists of 16 possible outcomes. One choice of probability measure \(\textrm{P}\) corresponds to assuming the 16 possible outcomes are equally likely. (This assumes that the coin is fair and the flips are independent.)

  1. Specify the probability of each individual outcome, e.g. \(\{HHTH\}\).
  2. Find \(\textrm{P}(A)\), where \(A\) is the event that exactly 3 of the flips land on heads.
  3. Find \(\textrm{P}(B)\), where \(B\) is the event that exactly 4 of the flips land on heads.
  4. Find and interpret \(\textrm{P}(A \cup B)\).
  5. Find \(\textrm{P}(E_1)\), the event that the first flip results in heads.
  6. Find \(\textrm{P}(E_2)\), the event that the second flip results in heads.
  7. Find and interpret \(\textrm{P}(E_1 \cup E_2)\).
  8. We assumed the 16 outcomes are equally likely. Do the axioms require this assumption?

Solution to Example 2.23

  1. The sample space is composed of 16 outcomes which are assumed to be equally likely, so the probability of each outcome is 1/16.
  2. \(A = \{HHHT, HHTH, HTHH, THHH\}\) is the event that exactly 3 of the flips land on heads. Since \(A\) consists of 4 distinct equally likely outcomes, \(\textrm{P}(A) = 4/16\).
  3. \(B = \{HHHH\}\), av event consisting of a single outcome, so \(\textrm{P}(B) = 1/16\).
  4. Directly, \(A \cup B = \{HHHT, HHTH, HTHH, THHH, HHHH\}\), so \(\textrm{P}(A\cup B) = 5/16\). Also \(A\) and \(B\) are disjoint, so \(\textrm{P}(A \cup B) = \textrm{P}(A) + \textrm{P}(B) = 4/16 + 1/16 = 5/16\).
  5. Intuitively this is 1/2, but sample space outcomes consist of a sequence of four coin flips, so we should define the proper event. \[ E_1 = \{HHHH, HHHT, HHTH, HTHH, HHTT, HTHT, HTTH, HTTT\} \] So \(\textrm{P}(E_1) = 8/16 = 1/2\).
  6. Similar to the previous part. \[ E_2 = \{HHHH, HHHT, HHTH, THHH, HHTT, THHT, THTH, THTT\} \] So \(\textrm{P}(E_2) = 8/16 = 1/2\).
  7. \(E_1 \cup E_2\) is the event that at least one of the first two flips is heads: \[\begin{align*} E_1 \cup E_2 & = \{HHHH, HHHT, HHTH, HTHH, THHH, HHTT, \\ & \quad HTHT, HTTH, THHT, THTH, HTTT, THTT\} \end{align*}\] So \(\textrm{P}(E_1 \cup E_2) = 12/16\). Note that \(\textrm{E}_1\) and \(\textrm{E}_2\) are not disjoint, so we cannot just add their probabilities. But we can use the general addition rule for two events. \[ E_1 \cap E_2 = \{HHHH, HHHT, HHTH, HHTT\} \] So \(\textrm{P}(E_1 \cup E_2) = \textrm{P}(E_1) + \textrm{P}(E_2) - \textrm{P}(E_1 \cap E_2) = 8/16 + 8/16 - 4/16 = 12/16\).
  8. No, the axioms do not require equally likely outcomes. If, for example, the coin were biased in favor of landing on Heads, we would want a different probability measure.

Probabilities are always defined for events (sets) but to shorten notation, it is common to write \(\textrm{P}(X=3)\) instead of \(\textrm{P}(\{X=3\})\), and \(\textrm{P}(X = 4, Y = 3)\) instead of \(\textrm{P}(\{X = 4\}\cap \{Y = 3\})\). But keep in mind that an expression like “\(X=3\)” really represents an event \(\{X=3\}\), an expression which itself represents \(\{\omega\in\Omega: X(\omega) = 3\}\), a subset of \(\Omega\).

Probabilities involving multiple events, such as \(\textrm{P}(A \cap B)\) or \(\textrm{P}(X=4, Y=3)\), are often called joint probabilities.

Example 2.24 Roll a four-sided die twice, and record the result of each roll in sequence. For example, the outcome \((3, 1)\) represents a 3 on the first roll and a 1 on the second; this is not the same outcome as \((1, 3)\). One choice of probability measure corresponds to assuming that the die is fair and that the 16 possible outcomes are equally likely. Let \(X\) be the sum of the two dice, and let \(Y\) be the larger of the two rolls (or the common value if both rolls are the same).

  1. Find the probability of each individual outcome, e.g., \(\{(3, 1)\}\).
  2. Find \(\textrm{P}(A)\) where \(A\) is the event the the sum of the dice is 4.
  3. Find \(\textrm{P}(B)\) where \(B\) is the event the the sum of the dice is at most 3.
  4. Find \(\textrm{P}(\{Y = 3\})\) a.k.a. \(\textrm{P}(Y=3)\).
  5. Find \(\textrm{P}(\{Y \le 3\})\) a.k.a. \(\textrm{P}( Y\le 3)\).
  6. Find \(\textrm{P}(\{X = 4\}\cap \{Y = 3\})\) a.k.a. \(\textrm{P}(X=4, Y=3)\) a.k.a. \(\textrm{P}(\{(X,Y) = (4,3)\})\).
  7. Are the values of \(X\) equally likely? The values of \(Y\)? The values of \((X, Y)\)?

Solution to Example 2.24

  1. Since 16 distinct, equally likely outcomes comprise the total probability of 1, the probability of each individual outcome must be 1/16.
  2. The event that the sum of the dice is 4 is \(A = \{(1, 3), (2, 2), (3, 1)\}\) which can be written as a union of three disjoint sets \(A = \{(1, 3)\} \cup \{(2, 2)\} \cup \{(3, 1)\}\). Therefore, \(\textrm{P}(A) = \textrm{P}(\{(1, 3)\}) + \textrm{P}(\{(2, 2)\}) + \textrm{P}(\{(3, 1)\}) = 1/16 + 1/16 + 1/16 = 3/16 = 0.1875\).
  3. The event the the sum of the dice is at most 3 is \(B=\{(1, 1), (1, 2), (2, 1)\}\), and so similarly to the previous part, \(\textrm{P}(B) = 3/16=0.1875\).
  4. Remember that \(\{Y=3\}\) represents the event that that larger of the two rolls is 3, \(\{Y=3\}=\{(1, 3), (2, 3), (3, 3), (3, 1), (3, 2)\}\), an event which can be written as the union of 5 disjoint events each having probability 1/16. Then \(\textrm{P}(\{Y = 3\}) = 5/16=0.3125\).
  5. \(\{Y \le 3\}=\{(1, 1), (1, 2), (1, 3), (2, 1), (2, 2), (2, 3), (3, 1), (3, 2), (3, 3)\}\) so similar to the previous parts \(\textrm{P}(Y \le 3) = 9/16=0.5625\).
  6. \(\{X = 4\}\cap \{Y = 3\} = \{(3, 1), (1, 3)\}\) so \(\textrm{P}(\{X = 4\}\cap \{Y = 3\})= 2/16 = 0.125\)
  7. The values of \(X\) are not equally likely; for example, \(\textrm{P}(\{X=4\}) = 3/16\) but \(\textrm{P}(\{X=2\})=\textrm{P}\{(1, 1)\} = 1/16\). The values of \(Y\) are also not equally likely; for example, \(\textrm{P}(\{Y=3\})=5/16\) but \(\textrm{P}(\{Y=1\})= \textrm{P}\{(1, 1)\} = 1/16\). And the values of \((X, Y)\) pairs are also not equally likely; for example, \(\textrm{P}(\{(X, Y) = (4, 3)\}) = 2/16\) but \(\textrm{P}(\{(X,Y) = (1, 1)\}) = 1/16\). So just because the underlying outcomes of the sample space are equally likely does not necessarily imply that everything of interest is equally likely as well.

Example 2.25 (Matching problem) Recall the “matching problem”. A set of \(n\) cards labeled \(1, 2, \ldots, n\) are placed in \(n\) boxes labeled \(1, 2, \ldots, n\), with exactly one card in each box. Consider the case \(n=4\). Let \(Y\) be the number of cards (out of 4) which match the number of the box in which they are placed.

We can consider as the sample space the possible ways in which the cards can be distributed to the four boxes. For example, 3214 represents that card 3 is returned (wrongly) to the first box, card 2 is returned (correctly) to the second box, etc. So the sample space consists of the following 24 outcomes25, which we will assume are equally likely.


1234   1243   1324   1342   1423   1432   2134   2143
2314   2341   2413   2431   3124   3142   3214   3241
3412   3421   4123   4132   4213   4231   4312   4321
                                               

  1. What are the possible values of \(Y\)? Find \(\textrm{P}(Y=y)\) for each possible value of \(y\).
  2. Let \(B\) be the event that at least one card is placed in the correct box. Find \(\textrm{P}(B)\).
  3. Let \(B_1\) be the event that card 1 is placed in box 1, and define \(B_2, B_3, B_4\) similarly. Represent the event \(B\) in terms of \(B_1, B_2, B_3, B_4\).
  4. Find \(\textrm{P}(B_1)\). Also find \(\textrm{P}(B_2)\), \(\textrm{P}(B_3)\), \(\textrm{P}(B_4)\).
  5. Find \(\textrm{P}(B_1\cap B_2 \cap B_3 \cap B_4)\).
  6. Is it possible to compute \(\textrm{P}(B_1 \cup B_2 \cup B_3 \cup B_4)\) based on just the five probabilities from the previous two parts?

Solution to Example 2.25

The following table evaluates \(Y\) for each of the possible outcomes.


Y(1234)=4   Y(1243)=2   Y(1324)=2   Y(1342)=1   Y(1423)=1   Y(1432)=2   Y(2134)=2   Y(2143)=0
Y(2314)=1   Y(2341)=0   Y(2413)=0   Y(2431)=1   Y(3124)=1   Y(3142)=0   Y(3214)=2   Y(3241)=1
Y(3412)=0   Y(3421)=0   Y(4123)=0   Y(4132)=1   Y(4213)=1   Y(4231)=2   Y(4312)=0   Y(4321)=0
                                               

  1. The possible values of \(Y\) are 0, 1, 2, 4. \(Y\) cannot be 3, since if 3 cards match, then the fourth card must necessarily match too. \(\textrm{P}(Y=0)=9/24\), \(\textrm{P}(Y=1)=8/24\), \(\textrm{P}(Y=2)=6/24\), \(\textrm{P}(Y=4)=1/24\).
  2. \(\textrm{P}(B) = \textrm{P}(Y\ge 1) = 1 - \textrm{P}(Y=0) = 15/24 = 0.625\).
  3. \(B = B_1\cup B_2\cup B_3\cup B_4\).
  4. Intuitively, \(\textrm{P}(B_1)=1/4\) since card 1 is equally likely to be placed in any of the 4 boxes. In terms of the sample space outcomes, \(B_1 =\{1234, 1234, 1243, 1324, 1342, 1423, 1432\}\), so \(\textrm{P}(B_1)=6/24=1/4\). Also \(\textrm{P}(B_2)=\textrm{P}(B_3)=\textrm{P}(B_4)=6/24\).
  5. \(\textrm{P}(B_1\cap B_2 \cap B_3 \cap B_4) = \textrm{P}({1234}) = 1/24\).
  6. Nope. The events are not disjoint, so you can’t just add their probabilities. Also note that \(\textrm{P}(B_1\cup B_2 \cup B_3\cup B_4)\neq \textrm{P}(B_1)+\textrm{P}(B_2)+\textrm{P}(B_3)+\textrm{P}(B_4)-\textrm{P}(B_1\cap B_2 \cap B_3 \cap B_4)\). As we mentioned previously, the general addition rule is complicated for more than two events.

For finite sample spaces with equally likely outcomes, computing the probability of an event reduces to counting the number of outcomes that satisfy the event. The continuous analog of equally likely outcomes is a uniform probability measure. When the sample space is uncountable, size is measured continuously (length, area, volume) rather that discretely (counting).

\[ \textrm{P}(A) = \frac{|A|}{|\Omega|} = \frac{\text{size of } A}{\text{size of } \Omega} \qquad \text{for a uniform probability measure $\textrm{P}$} \]

Example 2.26 Suppose the spinner in Figure 2.1 is spun twice. Let the sample space be \(\Omega = [0,1]\times [0,1]\) and let \(\textrm{P}\) be the uniform probability measure. Find the probability of each of the following events. (Hint: recall that these events are depicted in Figure 2.2)

  1. \(A\), the event that the first spin is larger then the second.
  2. \(B\), the event that the smaller of the two spins (or common value if a tie) is less than 0.5.
  3. \(C\), the event that the sum of the two dice is less than 1.5.
  4. \(D\), the event that the first spin is less than 0.4.

Solution to Example 2.26

  1. \(\textrm{P}(A) = 0.5\). The shaded triangle makes up half the area of the square. This should make sense because the first spin should be equally likely to be the larger of the two spins.
  2. \(\textrm{P}(B) = 0.75\).
  3. \(\textrm{P}(C)=0.875\). The unshaded triangle, representing \(C^c\), has area \((1/2)(0.5)(0.5) = 0.125\).
  4. \(\textrm{P}(D) = 0.4.\)

2.4.3 Non-uniform probability measures

For countable sample spaces, the probability measure is often defined by specifying the probability of each individual outcome. Then the probability of an event is obtained by summing the probabilities of the outcomes which comprise the event. Such was the case in Example 2.22; we could have specified the probability measuring by providing the probability of each face (1 with probability 0.1, …, 4 with probability 0.4) and then obtained probabilities of all the events in Table 2.3 by adding the appropriate outcome probabilities.

For uncountable sample spaces, integration typically plays the analogous role that summation plays for countable sample spaces.

Example 2.27 Consider the sample space \(\Omega=[0,\infty)\) with a probability measure26 defined by \[ \textrm{P}(A) = \int_A e^{-u}\, du, \qquad A \subseteq [0, \infty). \]
  1. Verify that \(\textrm{P}(\Omega)=1\).
  2. Compute \(\textrm{P}(A)\) for \(A=[0, 1]\).
  3. Without integrating again, compute \(\textrm{P}(B)\) for \(B=(1, \infty)\).
  4. Compute \(\textrm{P}(C)\) for \(C=[0, 1] \cup (2, 4)\).
Solution to Example 2.27

  1. Technically, \(\mathcal{F}\) is a \(\sigma\)-field of subsets of \(\Omega\): \(\mathcal{F}\) contains \(\Omega\) and is closed under countably many elementary set operations (complements, unions, intersections). While this level of technical detail is not needed, we prefer to refer to a probability space as a triple to emphasize that probabilities are assigned directly to events rather than just outcomes.

  2. Proof: Since \(\Omega = A \cup A^c\) and \(A\) and \(A^c\) are disjoint the axioms imply that \(1=\textrm{P}(\Omega) = \textrm{P}(A \cup A^c) = \textrm{P}(A) + \textrm{P}(A^c)\).

  3. Proof. If \(A \subseteq B\) then \(B = A \cup (B \cap A^c)\). Since \(A\) and \((B \cap A^c)\) are disjoint, \(\textrm{P}(B) = \textrm{P}(A) + \textrm{P}(B \cap A^c) \ge \textrm{P}(A)\).

  4. The proof is easiest to see by considering a picture like the one Figure 2.4 .

  5. For three events, \(\textrm{P}(A\cup B\cup C) = \textrm{P}(A) + \textrm{P}(B) + \textrm{P}(C) - \textrm{P}(A\cap B) - \textrm{P}(A \cap C) - \textrm{P}(B \cap C) + \textrm{P}(A \cap B \cap C)\).

  6. We will see a different expression of the law of total probability, involving conditional probabilities, in Section 3.1.4.

  7. That the probability of each outcome must be 1/4 when there are four equally likely outcomes follows from the axioms, by writing \(\{1, 2, 3, 4\} = \{1\}\cup\{2\}\cup \{3\}\cup \{4\}\), a union of disjoint sets, and applying countable additivity and \(\textrm{P}(\Omega)=1\).

  8. A probability measure is always defined on sets. When we say loosely "the probability of an outcome \(\omega\)’’ we really mean the probability of the event consisting of the single outcome \(\{\omega\}\). In this example \(\textrm{P}(\{\omega\})=1/4\) for \(\omega\in\{1, 2, 3, 4\}\)

  9. There are 4 cards that could potentially go in box 1, then 3 cards that could potentially go in box 2, 2 in box 3, and 1 left for box 4. This results in \(4\times3\times2\times1=4! = 24\) possible outcomes.

  10. This defines the Exponential(1) distribution; see Section ??.