## 2.1 Sample space of outcomes

- A phenomenon is
**random**if there are multiple potential*outcomes*, and there is*uncertainty*about which outcome will occur. - An outcome can be virtually anything: a sequence of coin flips, a shuffle of a deck of cards, the high temperature tomorrow in your city, the path of a hurricane, a sample of 1000 registered voters, player locations over time in an NBA game (as recorded by the SportsVU system), and on and on. In particular, an outcome does
*not*have to be a number. - Associated with any random phenomenon is the collection of all possible outcomes, called the
*sample space*.

**Definition 2.1 **The **sample space**, denoted \(\Omega\) (the uppercase Greek letter “omega”), is the set of all possible
outcomes of a random phenomenon. An **outcome**, denoted \(\omega\) (the lowercase Greek letter “omega”), is an element of the sample space (\(\omega\in\Omega\)).

- Mathematically, the sample space \(\Omega\) is a
*set*containing all possible outcomes, while an indvidual outcome \(\omega\) is a*point*or*element*in \(\Omega\); that is, \(\omega\in\Omega\). - The simplest random phenomena have just two distinct outcomes, in which case the sample space is just a set with two elements, e.g. \(\Omega=\{0, 1\}\). For example, the sample space for a single coin flip could be \(\Omega = \{H, T\}\).
- A random phenomenon is modeled by a
*single*sample space, upon which all objects (events, random variables, stochastic processes) are defined. (More notes about this point are scattered throughout the text.) - Whenever possible, a sample space outcome should be defined to provide the maximum amount of information about the random phenomenon.
- While a random phenomenon always has a corresponding sample space, we will see that in many situations the sample space of outcomes is at best only vaguely specified and can not be feasibly enumerated.

**Example 2.1 **Roll a four-sided die^{8} twice, and record the result of each roll in sequence as an ordered pair. For example, the outcome \((3, 1)\) represents a 3 on the first roll and a 1 on the second; this is not the same outcome as \((1, 3)\).

- Identify the sample space.
- We might be interested in the sum of the two dice. Explain why it is still advantageous to define the sample space as in the previous part, rather than as \(\Omega=\{2, \ldots, 8\}\).

*Solution * to Example 2.8

- The sample space consists of 16 possible ordered pairs of rolls \[\begin{align*} \Omega & = \{(1, 1), (1, 2), (1, 3), (1, 4), (2, 1), (2, 2), (2, 3), (2, 4),\\ & \quad (3, 1), (3, 2), (3, 3), (3, 4), (4, 1), (4, 2), (4, 3), (4, 4)\} \end{align*}\]
- Yes, we might be interested in the sum of the two dice. But we might also be interested in other things, like the larger of the two rolls, or if at least one 3 was rolled, or the individual rolls themselves. Knowing just the sum of the dice does not provide as much information about the outcome of the random phenomenon as the sequence of individual rolls does.

**Note:** In the previous example, there was a single sample space whose outcomes represented the result of the pair of rolls. In particular, there was not a separate sample space for each of the individual rolls.

**Example 2.2 **Consider the outcome of a sequence of 4 flips of a coin. For example, HTHT means heads on the first on third flips and tails on the second and fourth; this is not the same outcome as HHTT or THTH.

- Identify the sample space. (Hint: there are 16 possible outcomes.)
- We might be interested in the number of heads flipped. Explain why it is still advantageous to define the sample space as in the previous part, rather than as \(\Omega=\{0, 1, 2, 3, 4\}\).

*Solution * to Example 2.2

- The sample space is the following set composed of 16 distinct outcomes \[\begin{align*} \Omega & = \{HHHH, HHHT, HHTH, HTHH, THHH, HHTT, HTHT, HTTH,\\ & \qquad THHT, THTH, TTHH, HTTT, THTT, TTHT, TTTH, TTTT\} \end{align*}\]
- Yes, we might be interested in the number of heads flipped, but we might also be interested in other things, such as whether the first flip was heads, the length of the longest streak of heads in a row, or the proportion of flips following H that resulted in H. Knowing just the number of heads flipped does not provide as much information about the outcome of the random phenomenon as the sequence of individual flips does.

**Note:** In the previous example, there was a single sample space whose outcomes were sequences of coin flips. In particular, there was not a separate sample space for each of the individual flips.

**Example 2.3 (Matching problem) **So-called “matching problems” concern the following generic scenario. A set of \(n\) cards labeled \(1, 2, \ldots, n\) are placed in \(n\) boxes labeled \(1, 2, \ldots, n\), with exactly one card in each box. Typical questions of interest involve whether the number of a card matches the number of the box in which it is placed. (More colorful descriptions include returning babies at random to mothers or placing rocks at random back on a museum shelf.) Identify a reasonable sample space when \(n=4\).

*Solution * to Example 2.3

We can consider each outcome to be a particular placement of cards in the boxes. For example, the outcome 3214 (or \((3, 2, 1, 4)\)) represents that card 3 is placed in box 1, card 2 in box 2, card 1 in box 3, and card 4 in box 4. (We say that an outcome is a *permutation* (or reordering) of the numbers 1, 2, 3, 4.) So the sample space consists of the following 24 outcomes^{9}.

\[\begin{align*} \Omega & = \{1234, 1243, 1324, 1342, 1423, 1432 \\ & \qquad 2134, 2143, 2314, 2341, 2413, 2431 \\ & \qquad 3124, 3142, 3214, 3241, 3412, 3421 \\ & \qquad 4123, 4132, 4213, 4231, 4312, 4321\} \end{align*}\] Recording outcomes in this way provides more information than if we had chosen the sample space to correspond to, for example, the number of cards that match the number of the box in which they are placed.

**Example 2.4 (Collector problem) **So-called “collector problems” concern the following generic scenario. A set of \(n\) cards labeled \(1, 2, \ldots, n\) are placed in a box (or shuffled in a deck). Cards are selected one at a time, *with replacement*, indefinitely. Identify an appropriate sample space. Selection with replacement means that a card which has been selected is returned to the box and the cards are reshuffled before the next draw is made. (“Collector problems” typically concern the number of draws needed to observe each of the \(n\) tickets. For example, roll a fair six-sided die until each of the six faces is rolled at least once, or collecting prizes until a complete set is obtained.)

*Solution * to Example 2.4

We can let \(\Omega=\{1, \ldots, n\}^\infty\). That is, each outcome is an infinite sequence — since draws continue indefinitely — with each element in the sequence in \(\{1, \ldots, n\}\). For example, \((3, 2, 6, 3, 4, 1, 2, 4, \ldots)\) means card 3 is the first selection, card 2 the second, card 6 the third, card 3 the fourth, etc. Recording outcomes in this way, we could investigate the number of draws needed to obtain each of the \(n\) cards at least \(r\) times, for any \(r=1,2, \ldots\). (Note that \(\Omega\) is an uncountable set.)

**Example 2.5 **Many statistical applications involve *random sampling*. For example, polling organizations often select random samples of Americans. Typically the selection involves random digit dialing: a sample of say 1000 phone numbers are randomly selected from some large bank (hundreds of millions) of phone numbers; this is the *population*. An outcome would consist of the 1000 phone numbers selected; this is the *sample*.

As an extraordinarily unrealistic, oversimplified, but concrete example, suppose the bank only contains 5 numbers, labeled {1, 2, 3, 4, 5}, from which 3 are selected. Describe an appropriate sample space. Note: the order in which the numbers are selected does not matter; we only care which numbers are selected.

*Solution * to Example 2.5

An outcome consists of a *subset* of size 3 from the set \(\{1, 2, 3, 4, 5\}\), representing the list of the 3 phone numbers selected. There are 10 distinct outcomes
\[
\Omega = \{\{1, 2, 3\}, \{1, 2, 4\}, \{1, 2, 5\}, \{1, 3, 4\}, \{1, 3, 5\},\\
\qquad\quad \{1, 4, 5\}, \{2, 3, 4\},\{2, 3, 5\},\{2, 4, 5\},\{3, 4, 5\}\}
\]

In a more realistic setting, the population would consist of hundreds of millions of phone numbers, and the sample space would be composed of all possible subsets (samples) of 1000 phone numbers. Even if the order in which the numbers are selected is irrelevant, the sample space is enormous and could never be feasibly enumerated as in this oversimplified example. But the idea is the same: an outcome consists of a *subset* of numbers, and the sample space is a collection of possible subsets. (That it, the sample space is a set of sets.)

**Example 2.6 **
Many games involve rolling a die or spinning a spinner like the one in Figure 2.1 which returns values between 0 and 1. A spinner can be thought of as a continuous analog of a die roll. The values in the picture are rounded to two decimal places, but consider an idealized model where the spinner is infitely precise so that any real number between 0 and 1 is a possible outcome. The sample space corresponding to a single spin is then \(\Omega = [0, 1]\), a continuous interval of real numbers. Now suppose the spinner is spun twice; what is an appropriate sample space?

*Solution *
to Example 2.6

An outcome is now a pair of values \((\omega_1, \omega_2\)) corresponding to the results of the first and second spins. The sample space is \(\Omega = [0,1]\times [0,1]\), where \([0,1]\times [0,1]\), also denoted \([0,1]^2\), is the Cartesian product \([0,1]\times [0,1] = \{(\omega_1, \omega_2): \omega_1 \in [0, 1], \omega_2 \in [0, 1]\}\).

In the above (idealized) example, outcomes were measured on a continuous scale; any real number between 0 and 1 was a possible outcome of a single spin (of the idealized, infinitely precise spinner). Even in situations where outcomes are inherently discrete, it is often more convenient to model them as continuous. For example, if an outcome represents the annual salary in dollars of a randomly selected U.S. household, it would be more convenient to model the sample space as the continuous interval^[We could also try \([0, m]\) where \(m\) is some large dollar amount providing an upper bound on the maximum possible salary. But we would need to be sure that \(m\) is large enough so that all possible outcomes are in the sample space \([0, m]\). Without knowing this bound in advance, it is convenient to just choose the unbounded interval \([0, \infty)\).] \([0, \infty)\) rather than \(\{0, 1, 2, \ldots\}\) or \(\{0, 0.01, 0.02, \ldots\}\).

**Example 2.7 **Select a U.S. high school student and record the student’s SAT Math and Reading score. What is an appropriate sample space?

*Solution * to Example 2.7

An outcome will be an ordered pair represent (Math, Reading) score. Technically, possible scores are 200 through 800 in increments of 10. So we could consider the sample space to be \(\{200, 210, 220, \ldots, 790, 800\}\times \{200, 210, 220, \ldots, 790, 800\}\).

However, we could also model scores on a continuous scale, taking any value in the interval from 200 to 800. In this case, the sample space would be \(\Omega = [200, 800] \times [200, 800]\). Such a specification is often more convenient mathematically.

Why four-sided? Simply to make the number of possibilities a little more manageable (e.g., for in-class simulation activities). Rolling a four-sided die twice yields 16 possible pairs, while rolling a six-sided die yields 36 possible pairs.↩

There are 4 cards that could potentially go in box 1, then 3 cards that could potentially go in box 2, 2 to box 3, and 1 left for box 4. This results in \(4\times3\times2\times1=4! = 24\) possible outcomes. We will see more counting rules in Chapter

**??**.↩