Chapter 2 Probability Spaces

This chapter, the first of two, introduces our formalism for simple statistical spaces, which are the natural conclusion of the simplest chance regularity—that of the random experiment. At the end of these two lectures, we will have a complete theory of chance regularity stemming from random experiments.⁹

Having just described the forest in such happy terms, let me apologize now: you are about to be bludgeoned by trees. Do not lose sight of the forest!

2.1 Introductory Remarks

2.1.1 The Random Experiment

We aim to build a formalism for the ideal type of chance mechanism: the random experiment. In words, a random experiment is a chance mechanism with three properties:

all possible outcomes are known a priori;
the outcome of any given trial is not known a priori but there exists a perceptible regularity of occurrence associated with the outcomes; and
the trials can be repeated under identical conditions.

A simple statistical space—which, again, is just a formalization of the abstract ideal of a random experiment—is comprised of two elements:

A probability space that enumerates:
1. The possible outcomes of the experiment;
2. The events of interest; and
3. The probabilities assigned to each of those events.
A simple sampling space that enumerates the relationships among the observations.

Now, most of us work primarily with observational data, which often fail to live up to the three standards of a random experiment. It is rare for observations to be identical under repeated trials; what does a “repeated trial” mean for those of us interested in why civil wars start? Or voting—you’re not supposed to vote twice, right? So, perhaps our simple statistical space is inadequate to discuss anything realistic?

To this, I offer two responses.

First, experimental data is increasingly popular in the discipline; and
Second, even if most data fail to live up to the lofty standards of a random experiment, it remains that most analyses implicitly operate with the question “what would the ideal experiment for this phenomenon be?” in the background. At the very least, good analyses operate with that question in the background.

2.1.2 Goals of the Lecture

Today, we want to have a formalism for what a probability space is. We therefore will be formalizing the first two aspects of a random experiment: the outcomes set, the events of interest, and the probability function. As you can imagine, this will first require a mathematical digression on sets and functions. So, here comes the…

2.2 Mathematical Digression on Sets and Functions

2.2.1 Sets

The field of statistics depends on the theory of probability, and the theory of probability itself depends on the theory of sets. Set theory, in turn, is taken as primitive for all mathematical analysis but requires careful answers to important philosophical questions that go well beyond our scope here.¹⁰ Our main goal for now is to gather all the set theory needed to define probability functions in an acceptable way.

So what is a set? Speaking very loosely, a set is a collection of distinct objects,¹¹ which we call elements. For example, consider

\[ S = \left\{\clubsuit, \diamondsuit, \heartsuit, \spadesuit\right\}.\]

Here \(S\) is a set, and \(\clubsuit\), \(\diamondsuit\), \(\heartsuit\), and \(\spadesuit\) are its elements. We denote this inclusion by, say, \(\clubsuit \in S\), which we read “\(\clubsuit\) is an element of \(S\)”, or usually just “\(\clubsuit\) is in \(S\)”. If we want to say that some entity is not an element of \(S\), we do so by with, for example, \(\maltese \not\in S\). Note that the order of the elements does not matter for the definition of a set.

The elements of a set may themselves be sets. In such cases, I will try to call them collections and will try to use calligraphic letters (e.g. \(\mathcal{A}\)) instead of standard letters (e.g. \(A\)). For example, let’s break down the suits of cards by color: \[\begin{align*} S_B &= \{\clubsuit, \spadesuit\}, \\ S_R &= \{\diamondsuit, \heartsuit\}, \end{align*}\] where \(S_B\) captures the black suits and \(S_R\) captures the red suits.¹² Then we might form the collection \[\begin{align*} \mathcal{S} &= \{S_B, S_R\}, \\ &= \{\{\clubsuit, \spadesuit\}, \{\diamondsuit, \heartsuit\}\}, \end{align*}\]

so that \(S_B \in \mathcal{S}\) and \(S_R \in \mathcal{S}\) but \(\clubsuit \not\in \mathcal{S}\).

As a general rule, we will denote sets with capital lettrs like \(S\) or \(A\) and elements with lower-case letters like \(s\) or \(a\). Thus, \(s \in S\) and \(a \in A\) are both very common things to see, but \(A \in a\) is not.

You can imagine that it is often inconvient and sometimes impossible to introduce a set by enumerating all of its element like we just did above. We often define sets based on some property, such as with

\[ [0,1] = \left\{x : 0 \leq x \leq 1 \right\}. \] This is read “\([0,1]\) is the set of all \(x\) such that \(0\) is less than or equal to \(x\), which is less than or equal to \(1\).” The square brackets indcitate that these are weak inequalities; for strict inequalities, we use parentheses: \[ (0,1) = \left\{x : 0 < x < 1 \right\}. \]

2.2.1.1 Some Special Sets

In particular, there are some sets of numbers that are relevant for our purposes. We have:

The natural numbers, \(\mathbb{N} = 1,2,3,\ldots\);
The integers, \(\mathbb{Z} = ..., -2, -1, 0, 1, 2, \ldots\);
The rational numbers, \[ \mathbb{Q} = \left\{x : \exists~p,q \in \mathbb{Z}~\text{such that}~ x = \frac{p}{q}\right\},\] where \(\exists\) is read “there exists.” Thus, the definition above reads “\(\mathbb{Q}\) is the set of all numbers \(x\) such that there exist integers \(p\) and \(q\) where \(x = \frac{p}{q}\).” Not so bad, right?
The real numbers, denoted \(\mathbb{R}\). These are more difficult to define.¹³ These include the rational numbers and the limits of all of the sequences of rational numbers. To give the usual classic example, \(\sqrt{2}\) is the limit of a sequence of rational numbers, namely \[1, 1.4, 1.41, 1.414, 1.4142, 1.41421, 1.414213, \ldots ,\] but it is straightforward to show that \(\sqrt{2}\) not a rational number.¹⁴ Basically, the reals are any finite number you can think of and many that you cannot. It is tempting to write \[\mathbb{R} = \left\{x : -\infty < x < \infty\right\},\] and you see that sometimes, but it rubs me the wrong way. We also might focus on the non-negative reals, \[ \mathbb{R}_+ = \{x \in \mathbb{R} : x \geq 0\}, \] and likewise for the strictly positive reals \(\mathbb{R}_{++}\) when this inequality holds strictly.

One of the most important sets is the one with no elements: the empty set. We denote it \(\emptyset\).

2.2.1.2 Basic Operations

There are several relevant operations that work on pairs of sets, say \(A\) and \(B\). For starters, we say \(A\) is a subset of \(B\)—written \(A \subset B\)—if \(a \in A\) implies that \(a \in B\), too.¹⁵ For example, let \[\begin{align*} A &= \{1,2,3\}, \\ B &= \{1,2,3,4\}. \end{align*}\] Here \(A \subset B\) but \(B \not\subset A\). Two sets are equal if each is a subset of the other; for example, letting \[\begin{align*} C &= \{1,2,3,4,5\}, \\ D &= \{5,4,3,2,1\}, \end{align*}\]

we have \(C \subset D\) (because every element of \(C\) is also an element of \(D\)) and \(D \subset C\) (because every element of \(D\) is also an element of \(C\)) and thus that \(C = D\).¹⁶ If we want to emphasize that such is not the case, we might write \(A \subsetneq B\), read “\(A\) is a subset of \(B\), but is not equal to \(B\),” or more compactly “\(A\) is a proper subset of \(B\).”

For a given set \(A\), the set of all subsets is called the power set and is denoted \(2^A\). For example, if \(A = \{1,2,3\}\), then \[2^A = \{\emptyset, \{1\}, \{2\}, \{3\}, \{1,2\},\{1,3\},\{2,3\}, \{1,2,3\}\}.\] Note the inclusion of the empty set; it is a subset of all nonempty sets. Note also that, since \(A \subset A\), it is included in the power set, too. If you count the number of elements in \(A\) and in \(2^A\), you’ll understand the notation \(2^A\).

We now define the union of two sets: \[\begin{align*} A \cup B &= \{x : x \in A~\text{or}~x \in B\}, \end{align*}\] and the intersection of two sets: \[\begin{align*} A \cap B &= \{x : x \in A~\text{and}~x \in B\}. \end{align*}\] For example, let \[\begin{align*} A &= \{\text{Mingus}, \text{Gillespie}, \text{Ellington}\}, \\ B &= \{\text{Monk}, \text{Evans}, \text{Ellington}\}. \end{align*}\] Then \[\begin{align*} A \cup B &= \{\text{Mingus}, \text{Gillespie}, \text{Ellington}, \text{Monk}, \text{Evans}\}, \\ A \cap B &= \{\text{Ellington}\}. \end{align*}\] Two sets \(A\) and \(B\) are disjoint or mutually exclusive if \(A \cap B = \emptyset\). For example, let \[\begin{align*} E &= \{x \in \mathbb{Z}: x~\text{is an even number}\}, \\ O &= \{x \in \mathbb{Z}: x~\text{is an odd number}\}. \end{align*}\]

Clearly, \(E \cap O = \emptyset\). Here it is also true that \(E \cup O = \mathbb{Z}\). If disjoint subsets of a set equal that set when joined together via a union, we refer to them as a partition. Here \(E\) and \(O\) form a partition over \(\mathbb{Z}\).

We can extend unions and intersections to allow for more than two sets. Let \(\mathcal{A}\) be a collection with elements \(A_i, i = 1,2,\ldots\), that is, \[\mathcal{A} = \{A_1,A_2,\ldots\}.\] Here the dots indicate that the sets continue on without end. Then we have \[\begin{align*} \bigcup \mathcal{A} &= \bigcup_{i=1}^\infty A_i = \{a : a \in A_i~\text{for some}~A_i \in \mathcal{A}\}, \\ \bigcap \mathcal{A} &= \bigcap_{i=1}^\infty A_i = \{a : a \in A_i~\text{for all}~A_i \in \mathcal{A}\}. \end{align*}\]

Other operations work with respect to some reference set or universe of discourse. To build up this idea, define \[X \setminus A = \{x \in X : x \not\in A\}.\] Letting \(X\) serve as the reference set (which is often understood from the context), we might write \(\overline{A} = X \setminus A\), which we read as the complement of \(A\).

From here, we move on to de Morgan’s laws. Let \(I = \{1,2,3,\ldots\}\) serve as our index set. Now we have \[\begin{align*} \overline{\bigcup_{i \in I} A_i} &= \bigcap_{i \in I} \overline{A}_i, \\ \overline{\bigcap_{i \in I} A_i} &= \bigcup_{i \in I} \overline{A}_i. \end{align*}\]

So, the complement of the union of a collection is just the intersection of the individual complements, and the complement of the intersection of a collection is just the union of the individual complements.

2.2.1.3 Cartesian Productions

Given two sets \(A\) and \(B\), the Cartesian product \(A \times B\) is just \[A \times B = \{(a,b) : a \in A~\text{and}~b \in B\}.\] So, it is the set of all ordered pairs. More generally, given a collection of sets \(\mathcal{A}\), we have \[\prod \mathcal{A} = A_1 \times A_2 \times A_3 \times \cdots.\] We will also use the symbol \(\prod\) to denote multiplication of numbers, but you will be able to tell which usage is intended from the context.

2.2.1.4 Cardinality and the Infinities

We often are concerned with the size of a set. The most intuitive way to work here is to think about the number of elements. We refer to this as a set’s cardinality. For a set \(A\), the resulting cardinality is \(|A|\). For example, the cardinality of \(S = \{A,B,C\}\) is \(|S| = 3\). We call sets with a single element singletons. For example, \(X = \{a\}\) is a singleton, as \(|X| = 1\). Note that \(a\) and \(\{a\}\) are not the same thing, strictly speaking, as \(a \in X\) but \(\{a\} \subset X\) (indeed, \(\{a\} = X\), since \(X \subset \{a\}\), too).

The most straightforward kind of set is one like the example immediately above: there is a finite number of elements, and so we say the set is finite. Otherwise, it is infinite.

Now consider the natural numbers, \(\mathbb{N} = \{1,2,3,\ldots\}\). This is not a finite set, as there is no largest natural number. Its cardinality is therefore infinite, and so \(\mathbb{N}\) is an infinite set. In particular, we will say that \(\mathbb{N}\) is countably infinite. Indeed, we will say that any set \(S\) is countably infinite if we can assign each of its elements to a natural number. For example, the odd numbers, the even numbers, and the integers are all countably infinite—you can assign each of their elements a unique natural number.

But not all infinite sets are countable; some are larger. Letting \(\aleph_1\) denote the number of elements in \(\mathbb{R}\) and \(\aleph_2\) denote the number of elements in \(\mathbb{N}\), we have \(\aleph_1 > \aleph_2\). To get an intuition as to why this is, consider two real numbers, \(x,y \in \mathbb{R}\). Without any loss of generality, suppose that we were trying to count the reals and that we assigned \(x\) an index of 1 and \(y\) an index of 2. Could we safely proceed on to other numbers knowing that \(x\) and \(y\) are addressed? No; we’ve skipped over infinite real numbers. There’s no way to order them so that we can assign each a natural number.¹⁷ The real numbers \(\mathbb{R}\) are what we call uncountably infinite and thus are an uncountable set.

Note that \(A= [0,1]\) and \(B=[0,2]\) are both uncountably infinite. Wouldn’t most of us say that \(A\) is smaller than \(B\)? This just goes to show that cardinality is a somewhat limited way to talk about size. It is, however, the most useful. We could use other approaches—like those you see in measure theory—to come up with intuitive ways to say something like \(B\) is longer than \(A\), or in many dimensions that \(B\) is bigger than \(A\).

2.2.2 Functions

You are probably accustomed to statements of the form \(f(x) = x^2\). We will need a somewhat more general way to think about functions. I will start off somewhat loosely. We are often concerned with attaching elements of one set with elements of another set. Sometimes, you see this referred to as a “marriage” between the sets. So, loosely speaking, we can say that a function, call it \(f\), is a relation among two sets, call them \(A\) and \(B\), that satisfies the restriction that for each \(a \in A\), there is a single element \(b \in B\) such that \((x,y) \in f\). We call the sets \(A\) and \(B\) the domain and co-domain, respectively.

What is a relation? A relation between sets \(A\) and \(B\) is any subset of their Cartesian product, \(A \times B\). A function is a special kind of relation that ensures that each element of \(A\) is paired and that it is assigned a single element of \(B\).¹⁸

When we introduce a function, we usually do so like this: “define the function \(f: A \rightarrow B\).” This means that \(A\) is the domain and \(B\) is the co-domain. You can then define \(f(x) = x^2\) from there if you would so like.

We sometimes concern ourselves with some other relevant sets defined by \(f\). For example, the image of \(C \subset A\) given \(f\) is \(f(C) = \{f(c) \in B : c \in C\}\). Thus, if we define \(f: \mathbb{R} \rightarrow \mathbb{R}\) where \(f(x) = x^2\), it turns out that \(f(\mathbb{R}) = \mathbb{R}_+\), since all squares are non-negative. The pre-image of \(D \subset B\) is the same in reverse: \(f^{-1}(D) = \{a \in A : f(a) \in D\}\).

A function that assigns each distinct element of the domain to a distinct element of the co-domain is called injective or one-to-one. That is, \(f\) is injective if \(x,y \in A\) with \(x \neq y\) implies that \(f(x) \neq f(y)\). A function that uses every element of the co-domain is surjective or onto. That is, \(f\) is surjective if \(f(A) = B\). A function that is both injective and surjective is called bijective.

2.3 Probability Spaces

Remember that a random experiment is a chance mechanism that satisfies:

all possible outcomes are known a priori;
in any particular trial the outcome is not known a priori but there exists a discernible regularity of occurrence associated with the outcomes; and
it can be repeated under identical conditions.

We will formalize the first two of these requirements today. On the subject of formalization, let’s call a random experiment \(\mathscr{E}\).

2.3.1 Condition One: The Outcomes Set

We will collect all of the possible distinct outcomes of an experiment into an outcomes set, which we will usually denote \(S\). For example, if the experiment is to toss a coin twice and note the outcome, then \[S = \{(HH), (HT), (TH), TT\}.\]

Outcomes sets can be countably infinite. If the experiment is to toss a coin until the first heads comes up, then \[S = \{(H), (TH), (TTH), (TTTH), (TTTTH), \ldots\}.\] This is countably infinite, as each of these outcomes can be assigned a natural number, but (in theory) you could toss an arbitrary number of tails prior to the first heads.

Outcomes sets can be uncountably infinite, too. If the experiment is to turn on a lightbulb and keep it on until it burns out, then \[S = \mathbb{R}_{+},\] since it cannot be turned on for “negative” time.

So, the requirement of a random experiment is that each of the distinct possible outcomes be known in advance. We then collect these into the outcomes set \(S\).

2.3.2 Condition Two: The Events of Interest

It could well be that we don’t care about the elementary outcomes per se. For example, consider the experiment of tossing two coins and counting the number of heads. As noted before, we have \[S = \{(HH), (HT), (TH), (TT)\}.\]

Now consider the event of \(n\) heads being tossed, where \(n \in \{0,1,2\}\). I will label these \(E_n\). Each will be given a subset of the outcomes set: \[\begin{align*} E_0 &= \{(TT)\}, \\ E_1 &= \{(HT), (TH)\}, \\ E_2 &= \{(HH)\}. \end{align*}\] Or maybe we care about whether the coins match: \[\begin{align*} E_{\text{match}} &= \{(HH), (TT)\}. \end{align*}\] Or maybe we care about whether at least one head is tossed: \[\begin{align*} E_{\geq 1} &= \{(HH), (HT), (TH)\}. \end{align*}\]

So, an event is just a subset of the outcomes set. When we discussed the power set, we noted that \(\emptyset\) and \(S\) are both subsets of \(S\). We refer to \(\emptyset\) as the impossible event and S as the sure event, though these are just empty nomenclature right now.

To be explicit about things, if I say \(s \in S\), I mean that \(s\) is being thought of as an elementary outcome. Conversely, if I say \(\{s\} \subset S\), I’m talking about it as an event. Events are not outcomes! Outcomes are not events! Events are combinations of outcomes!

2.3.2.1 Rôle of the Events of Interest

We now must make more precise just what we mean by events of interest. At the very least, we now know that events are just subsets of the outcomes set \(S\).

To keep the drama to a minimum, let me reveal what you might have guessed by now: the events of interest will serve as the domain of a function that assigns probabilities. So, we may wish to know the probability of, say, the event getting at least one heads in two tosses of a coin. Implicit in this, however, is the probability of not getting at least one heads in two tosses. That is, if we wish for an event \(E\) to be included in the domain of our probability fucntion, we must also have \(\overline{E}\) included, as well.

Similarly, let’s think about two separate events—say, getting zero heads and getting two heads in the toss of a coin. Call these \(E_0\) and \(E_2\) as before. Along with the respective probabilities of each, we will also want the probability of their union—the probability that \(E_0\) happens or that \(E_2\) happens.

In other words, once we have defined the events of interest, we have implicitly implied secondary events of interest that address both the complement and the unions of events. This will play a key role in our ability to specify the events of interest proper.

2.3.2.2 The Power Set and Associated Difficulties

It is tempting to build the most general theory of probability that we can.¹⁹ And hey, if each event is a subset of the outcomes space, then why not just define probability functions with a domain of \(2^S\)—the power set of the outcomes space, which contains all possible subsets of \(S\) and thus all possible events of interest?

For example, suppose that we are interested in the outcomes of tossing a coin twice.²⁰ So, that means the outcomes set is \[S = \{(HH), (HT), (TH), (TT)\}.\] Very good. So, let’s think about what this means for the power set, shall we?

So, we know that \(\emptyset\) and \(S\) are elements, so there’s two elements;
We also need all the singleton options: \((HH)\), \((HT)\), \((TH)\), \((TT)\), which brings our running total up to six elements;
And, we need all possible pairs of these: \(((HH), (HT))\), \(((HH), (TH))\), \(((HH), (TT))\), \(((HT), (TH))\), \(((HT), (TT))\), and \(((TH),(TT))\), so that we’re up to 12 elements; and
All the possible triples: \(((HH),(HT),(TH))\), \(((HH),(HT),(TT))\), \(((HH),(TH),(TT))\), \(((HT),(TH),(TT)\), so that we’re at 16 elements, which happens to be \(2^4\).

That isn’t so bad—with a little work, we could probably assign reasonable probabilities to sixteen things, right? Two responses:

For starters, even with simple problems like the one discussed above, things get unwieldy quite quickly. If we toss the coin three times instead of twice, we end up with a power set with 256 elements. Indeed, the number of elements goes up really quickly in the number of tosses. To see why, note first that there are \(2^n\) possible outcomes for \(n\) tosses of a coin—so that \(|S| = 2^n\). And then, the number of elements in the power set is \(2^{|S|}\), so that the number of elements in the power set for \(n\) tosses is \(2^{2^n}\).²¹ Four tosses? 65,536 elements of the power set. Five tosses? Over four and a quarter million elements. Working at a speed of one event per second around the clock, it would take over five years to enumerate every possible event in the case of six tosses. Now think to yourself: there are a little under 200 countries in the world, each tossing a civil war coin. There are tens of millions of voters, each tossing a Democrat-Republican-Other die—oh my word that means the base number is three instead of two. Even with finite numbers—reasonable ones that we can understand!—the power set gets too big too fast.
…and now the bad news! It turns out that, if \(S\) is countable, then the power set is at least possible to define probabilities on, even if it’s not practical. But, if \(S\) is uncountably infinite, then it may be that \(2^S\)—which still exists—cannot serve as the domain of a probability function, whether we’re willing to put in the years or not.

So, we have one practical reason and one purely mathematical reason to not use \(2^S\) in the general case.

2.3.2.3 The Workaround

Because of these problems, we need to find a way to wrangle the events of interest into something more workable. It turns out that we can focus our attention on collections of subsets of \(S\) that satisfy certain properties. We will refer to collections that satisfy these properties as algebras or, if they satisfy stronger conditions, as \(\sigma\)-algebras.

A nonempty collection \(\mathfrak{F}\) of subsets of a set \(S\) is an algebra of sets if it is closed²² under finite unions and complementation—that is, if \(E_1,E_2 \in \mathfrak{F}\), then it must be that \(E_1 \cup E_2 \in \mathfrak{F}\) and \(\overline{E}_1 \in \mathfrak{F}\). It is a \(\sigma\)-algebra if it is closed under countable unions, so that if \(E_1,E_2,\ldots \in \mathfrak{F}\), then \(E_1 \cup E_2 \cup \cdots \in \mathfrak{F}\). Clearly all \(\sigma\)-algebras are algebras, but not all algebras are \(\sigma\)-algebras.

So, as an example: suppose we’re tossing a coin three times, so that \[S = \{(HHH),(HHT),(HTT),(HTH),(TTT),(TTH),(THT),(THH)\}.\] But, suppose that we’re interested only in \(A_1 = \{(HHH)\}\) and \(A_2 = \{(TTT)\}\), the two events where three of a kind are tossed. We might define the relevant algebra as: \[\mathfrak{F} = \{\emptyset,S,A_1,A_2,(A_1 \cup A_2),\overline{A}_1,\overline{A}_2,(\overline{A}_1 \cap \overline{A}_2)\}.\] I’ll ask you to verify that this is an algebra in your problem set.

Note that, for countable sets \(S\), \(2^S\) is necessarily a \(\sigma\)-algebra. However, if \(S\) is uncountable, then the smallest \(\sigma\)-algebra on \(S\) is a proper subset of the power set—that is, there are subsets of \(S\) that are not in the relevant \(\sigma\)-algebra. It turns out that these exclusions provide the necessary mathematical structure for us to be able to define a probability function.²³

2.3.3 The Probability Function

So, let’s define that now. Given an outcomes set \(S\) and a \(\sigma\)-algebra \(\mathfrak{F}\), a function \(P: \mathfrak{F} \rightarrow [0,1]\) is a probability function if it satisfies the following three axioms:

\(P(S) = 1\) for any outcomes set \(S\);
\(P(A) \geq 0\) for any \(A \in \mathfrak{F}\); and
For a countable sequence of mutually exclusive events \(A_1,A_2,\ldots\), where \(A_i \cap A_j = \emptyset\) for all \(i \neq j\), we have \[P\left(\bigcup_{i=1}^\infty A_i\right) = \sum_{i=1}^\infty P(A_i).\] These are called Kolmogorov’s Axioms in honor of Andrey Kolmogorov, who first axiomatized probability this way. These are not the only axioms used—in particular, some probability theories use only finite unions—but they are easily the most common.

Aliprantis, Charalambos D., and Kim C. Border. 2006. Infinite Dimensional Analysis: A Hitchhiker’s Guide. Third. Berlin: Springer.

Carter, Michael. 2001. Foundations of Mathematical Economics. Cambridge, MA: The MIT Press.

Casella, George, and Roger L. Berger. 2002. Statistical Inference. Second. Pacific Grove, CA: Duxbury.

DeGroot, Morris H., and Mark J. Schervish. 2012. Probability and Statistics. Fourth. Boston, MA: Addison-Wesley.

Duggan, John. 2013. “Basic Concepts in Mathematical Analysis: A Tourist Brochure.” http://www.johnduggan.net.

Fearon, James D., and David D. Laitin. 2003. “Ethnicity, Insurgency, and Civil War.” American Political Science Review 97 (1): 75–90.

Friedman, Milton. 1953. “The Methodology of Positive Economics.” In Essays in Positive Economics, edited by Milton Friedman, 3–43. University of Chicago Press.

Simon, Carl P., and Lawrence Blume. 1994. Mathematics for Economists. New York: W.W. Norton & Company.

Spanos, Aris. 1986. Statistical Foundations of Econometric Modelling. Cambridge, UK: Cambridge University Press.

Wackerly, Dennis D., William Mendenhall III, and Richard L. Scheaffer. 2008. Mathematical Statistics with Applications. Seventh. Belmont, CA: Thomson.

After having built such a beautiful apparatus, we will quickly realize its faults; in particular, it is of no use for discussing data. Great. We will therefore concern ourselves with morphing the simple statistical space into something that can address data: the simple statistical model. From there, we will be able to create extensions to talk about all kinds of data. Be patient!↩
Consider, for example, the mind-bending Russell’s Paradox, in which we try to form the set of all sets that do not include themselves.↩
We have just begged the question—what the hell is a collection?! Seriously, you don’t want to go there.↩
Why does LaTeX show these as white? Weird.↩
Indeed, one of the main early tasks in an analysis class is to construct the real numbers using axioms. We will avoid that; you’re welcome.↩
I will prove by contradiction. Suppose \(\sqrt{2}\) were rational. This would mean that there exist some integers \(p\) and \(q\) such that \(\sqrt{2} = \frac{p}{q}\). We do not affect any results by assuming that \(p\) and \(q\) have no common factors, as we can just cancel them out from the numerator and denominator. Squaring both sides, we have \(2 = \frac{p^2}{q^2}\), implying \(2q^2 = p^2\). This means \(p^2\) is even, which holds only when \(p\) is even. This means \(p^2\) must be divisible by 4. This means \(q\), and thus \(q^2\), must also be even. But this means \(p\) and \(q\) have a common factor—namely, 2. We therefore have contradicted a premise, so it must be that \(\sqrt{2}\) is rational.↩
Implies? What is implies? By this, we mean that if we know it is true that the first statement is true, then we also know that the second statement is true. The sentence has no bite for situations where the first statements is false.↩
The order of the elements in a set does not matter.↩
The ordering part here is what matters, rather than the “skipping infinity of them” part. Note that the rational numbers \(\mathbb{Q}\) are countable, even though you “skip infinity of them” to get from one to another. Without loss of generality, let me just work with the positive rationals. Take any positive rational and call it \(\frac{a}{b}\), where \(a\) and \(b\) have no common factors. Then we can assign each of them the unique integer \(2^a3^b\).↩
Functions are by far the most common kind of relation you will work with. However, in game theory we will discuss preference relations that satisfy certain properties, and we will also generalize functions into correspondences that can assign multiple elements of the co-domain to any element of the domain, so that \(y \in f(x)\) instead of \(y = f(x)\). But let’s not get ahead of ourselves.↩
Remember: simplicity is good, as it means that we can explain lots of things with few assumptions.↩
By now, I am as tired as you are of flipping two hypothetical coins, and I wonder to myself: just what sin did I commit for Dante to send me to the coin-flipping circle of the Inferno?↩
You know that numbers are getting too big too fast when people have to develop a special notation to handle them.↩
We say a collection is closed under some operator if the output of the operator when each of the collection’s elements is input are also in the family. So, for example, since the sum of two integers is itself an integer, the integers are closed under addition.↩
Actually, these are what you need to ensure that you can define a measure. Probability theory is just an application of measure theory.↩