4 Probability

Probability is a fundamental concept in econometrics and economics. It is used to quantify the likelihood of an event occurring, and allows us to make predictions and decisions based on uncertain future outcomes. In this chapter, we will explore the basics of probability theory and how it is applied in the field of economics.

Definition 4.1 Let \(\Omega\) be the set of potential outcomes of a random experiment. In other words, the space of elementary events. Also, let \(\omega_i\) denote a specific point in \(\Omega\). In other words, an elementary event (or atomic event) in the space of possible events.

Example 4.1 Tossing a coin twice (random experiment)

\[\Omega = \{HH, HT, TH, TT\}\]

A random experiment is a real conceptual experiment for which two things must hold,

  1. It is possible to determine ex-ante the set of potential outcomes.
  2. There exists a way to conduct it (i.e., the experiment is feasible)

A random event is, then, a subset of \(\Omega\). You can interpret is as making a claim on the outcome of an experiment which is boolean (either True or False) and verifiable. For instance,

\[ A = \text{"at least one head"} = \{HH, HT, TH\} \\ B = \text{"no more than one head"} = \{HT, TH, TT\}\]

The following is a list of basic operations we can perform of events, as well as special mentions of notable events worth remembering.

  1. \(\Omega =\) certain event
  2. \(\emptyset =\) impossible event
  3. A is an event \(\implies A^c = \overline{A}\) is the complement (or negation) of A. This implies that \(\omega \in \overline{A} \iff \omega \notin A\). In other words, \(\overline{A}\) is True if and only if A is False.
  4. A, B events \(\implies A \cup B\) is True \(\iff\) A is True or B is True (or both are True).
  5. A, B events \(\implies A \cap B\) is True \(\iff\) A and B are both True.
  6. A, B events and \(A \subseteq B \implies \omega \in\) A implies that \(\omega \in\) B, \(\forall w \in\) A. In other words, if A is True then B is True.
  7. \(A \cap B = \emptyset \implies\) A and B are incompatible, or disjoint, events.

Distributive properties

  • \(A \cup (B \cap C) = (A \cup B) \cap (A \cup C)\)
  • \(A \cap (B \cup C) = (A\cap B) \cup (A \cap C)\)
  • De Morgan’s Laws: \[ \overline{A \cup B} = \overline{A} \cap \overline{B} \\ \overline{A \cap B} = \overline{A} \cup \overline{B}\]

Algebra of events(\(\Upsilon\))

It is a class of sets (i.e., events in \(\Omega\)) that is closed with respect to the following operations.

  1. \(\Omega \in \Upsilon\)
  2. \(A \in \Upsilon \iff \overline{A} \in \Upsilon\)
  3. \(\{A_i\}_{i-1}^n \in \Upsilon \implies \bigcup_{i=1}^n A_i \in \Upsilon\) (closed under finite, and countable, union). If \(\{A_i\}_{i-1}^\infty \in \Upsilon \implies \bigcup_{i=1}^\infty A_i \in \Upsilon\) (closed under infinite, and countable, union), then \(\Upsilon\) is a \(\delta\)-algebra (or Borel field).

Note: For a given sample sapce \(\Omega\), there are many different \(\delta\)-algebras.

Example 4.2

  • \(\{\emptyset, \Omega\} =\) trivial \(\delta\)-algebra.
  • The powerset \(\mathbb{P}(\Omega)\), which contains all the subsets of \(\Omega\).

Definition 4.2 (Axiomatic (Kolmogorov) definition) Given a measurable space \((\Omega, \Upsilon)\), a probability function, or probability measure, \(P\) with domain \(\Upsilon\) is any function satisfying the following conditions.

  1. \(P(A) \geq 0, \forall A \in \Upsilon\)
  2. \(P(\Omega) = 1\)
  3. (finite additivity) \(\forall \{A_i\}_{i=1}^n \in \Upsilon\) such that \(A_i \cap A_j = \emptyset, i \neq j\). \[ \implies P(\bigcup_{i=1}^n A_i) = \sum_{i=1}^n P(A_i)\].
  4. (countable additivity) Note that if \(\Upsilon\) is a \(\delta\)-algebra then \(\forall \{A_i\}_{i=1}^\infty \in \Upsilon\) such that \(A_i \cap A_j = \emptyset, i \neq j\). \[ \implies P(\bigcup_{i=1}^\infty A_i) = \sum_{i=1}^\infty P(A_i)\]

The triplet \((\Omega, \Upsilon, P)\) is called a probability space.

Example 4.3 Suppose \(\Omega =\) all integers \(\geq 0\). Then, \[ \Upsilon_1 = \{\Omega, \emptyset, \{0, 1,..., 100\}, \{101, 102,...,\infty\}\} \] is an algreba. But, \[ \Upsilon_2 = \{\Omega, \emptyset, \{0\}, \{1\},...\} \] is neither a \(\delta\)-algebra, nor an algebra.

4.1 Properties of Probability Functions

  1. \(A \subseteq B \iff P(A) \leq P(B),\) where \(A,B \in \Upsilon\) (Monotonicity)
  2. \(P(A) \in [0, 1], \forall A \in \Upsilon\)
  3. \(P(A \cup B) = P(A) + P(B) - P(A \cap B) \textit{(joint probability of A and B)}, \forall A, B \in \Upsilon\)
  4. \(P(\overline{A}) = a P(A), \forall A \in \Upsilon\)
  5. \(P(\emptyset) = 0\)
  6. \(P(\overline{A} \cap B) = P(B) - P(A \cap B), \forall A,B \in \Upsilon\)
  7. \(P(\overline{A} \cup \overline{B}) = 1 - P(A \cap B), \forall A,B \in \Upsilon\)
  8. \(P(\overline{A} \cap \overline{B}) = 1 - P(A \cup B), \forall A,B \in \Upsilon\)
  9. \(P(\overline{A} \cup B) = P(\overline{A}) + P(B) - P(\overline{A} \cap B), \forall A,B \in \Upsilon\)
  10. \(\forall A_i, i \geq 1, P(\bigcup_{i=1}^\infty A_1) \leq \sum_{i=1}^\infty P(A_i)\) (Subadditivity, or Boole’s inequality)
  11. \(P(A) = \sum_{i=1}^\infty P(A \cap C_i), \forall \{C_i\}_{i=1}^\infty\) s.t. \(\bigcup_{i=1}^\infty C_i = \Omega\) and \(C_i \cap C_j = \emptyset, \forall i \neq j\) (Infinite partition of \(\Omega\))
  12. \(P(A \cap B) \geq P(A) + P(B) - 1, \forall A, B \in \Upsilon\) (Bonferroni bound)

4.2 Conditional Probabilities

Consider a given probability space \((\Omega, \Upsilon, P)\). Let \(A,B \in \Upsilon\) and \(P(B) > 0\). Then, the conditional probability of \(A\) given \(B\) is

\[ P(A|B) = \frac{P(A \cap B)}{P(B)} \] > This is the probability that the outcome is A, given that you know that it is in B (i.e., that B already occurred)

Example 4.4 Roll a dice \(\implies \Omega = \{1, 2,3,4,5,6\}\)

\(A = \{4,5,6\}; B \{2,4,6\}\)

\(P(A) = \frac{1}{2}; P(B) = \frac{1}{2}\)

\(P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{\frac{2}{6}}{\frac{1}{2}} = \frac{2}{3}\)

Note that \(\Omega\) “changes” if we know that B already occurred \(\implies \Omega' = \{2,4,6\}\)

Note: \(A, B \in \Upsilon\) and \(P(A) > 0, P(B) > 0\)

$P(A B) = P(A|B) P(B) $ probability of compound events. But also,

\(P(A \cap B) = P(B|A) \cdot P(A)\), given that, by definition of conditional probability, \[ P(B|A) = \frac{P(A \cap B)}{P(A)} \]

Example 4.5 A box with 4 balls (2Y + 2R). Randomly draw 2 balls without replacement.

\[A: \text{"first ball is Y"} \\ B: \text{"second ball is Y"} \\ P(A) = \frac{1}{2} \\ P(A \cap B) = P(B|A) \cdot P(A) = \frac{1}{3} \cdot \frac{1}{2} = \frac{1}{6}\]

Note:

  • \(A \cap B \implies \emptyset P(A|B) = \frac{P(A \cap B)}{P(B)} = \frac{0}{P(B)} = 0\)
  • \(A \subset B \implies P(A|B) = \frac{P(A)}{P(B)} < 1\) (B is necessary for A)
  • \(B \subset A \implies P(A|B) = \frac{P(B)}{P(B)} = 1\) (B is sufficient for A)

4.2.1 Total Probability Theorem

The Total Probability Theorem is a fundamental theorem in probability theory that allows us to calculate the probability of an event by considering all the possible ways in which the event can occur. It states that, given a set of mutually exclusive events (events that cannot happen at the same time), the probability of any one of these events occurring is equal to the sum of the probabilities of each individual event occurring, multiplied by the probability of the event occurring given that the individual event has occurred. This theorem is useful in a variety of situations, such as when we want to find the probability of an event occurring in a sequence of events or when we want to calculate the probability of an event occurring given some partial information about the event.

Proposition 4.1 Let \(E_1, E_2,...,E_n \in \Upsilon\) be a sequence of events defining a partition of \(\Omega\). That is,

  • \(\bigcup_{k=1}^n E_k = \Omega\)
  • \(E_k \cap E_j = \emptyset, \forall \neq k\)
  • \(P(E_k) > 0, \forall k=1,...,n\)

Note that n can also diverge to +\(\infty\)

Then, \(\forall A \in \Upsilon, P(A) = \sum_{k=1}^n P(A|E_k) \cdot P(E_k)\) (it is a weighted average)

Proof. Let \(A \in \Upsilon\).

\[\begin{align} A = A \cap \Omega &= A \cap (\bigcup_{k=1}^n E_k) \\ &= \bigcup_{k=1}^n (A \cap E_k) \leftarrow \text{(Distributive property on disjoint events)} \end{align}\]

\(\implies\) \[\begin{align} P(A) &= P[\bigcup_{k=1}^n (A \cap E_k)] \\ &= \sum_{k=1}^n P(A \cap E_k) \\ &= \sum_{k=1}^n P(A|E_k) \cdot P(E_k) \leftarrow \text{(by probability of compound events)} \end{align}\]

Example 4.6 You have 2 R balls + 2 T balls in a box. I draw 2 balls without replacement.

\[\begin{align} P(A) &= \frac{1}{2} \\ P(B) &= P(B|A) \cdot P(A) + P(B| \overline{A}) \cdot P(\overline{A}) \\ &= \frac{1}{3} \cdot \frac{1}{2} + \frac{2}{3} \cdot \frac{1}{2} \\ &= \frac{1}{2} \text{(by the total probability theorem)} \end{align}\]

Note: \(\Omega = A \cup \overline{A} \implies A \wedge \overline{A}\) are a finite partition of \(\Omega\)

4.2.2 Baye’s theorem

Probability plays a central role in econometrics, as it allows us to make inferences about economic phenomena based on data. One important tool for updating probabilities in the context of econometrics is Bayes theorem. Named after Reverend Thomas Bayes, this theorem is a mathematical formula that allows us to revise our beliefs about the probability of an event occurring, given new information. In the field of econometrics, Bayes theorem is often used to update beliefs about economic variables or relationships, based on new data or observations.

For example, an economist may want to estimate the probability that a recession will occur in the next year. They have data on past recessions and know that, historically, there is a 20% chance of a recession occurring in any given year. They also have new data on current economic indicators, such as GDP growth and inflation, which they believe are related to the likelihood of a recession. Using Bayes theorem, the economist can update their belief about the probability of a recession occurring in the next year, taking into account both the historical data and the current economic indicators.

Proposition 4.2 Under the same assumptions of the total probability theorem,

\[ \forall A \in \Upsilon, P(E_j | A) = \frac{P(A|E_j) \cdot P(E_j)}{\sum_{k=1}^n P(A|E_k) \cdot P(E_k)} \]

Proof. By the total probability theorem and the probability of compound events,

\[ P(E_j | A) = \frac{P(E_j \cap A)}{P(A)} = \frac{P(A|E_j) \cdot P(E_j)}{\sum_{k=1}^n P(A|E_k) \cdot P(E_k)} \]

Example 4.7 Suppose you are hired by a firm with three plants. Also, you know the following information

  • Plant 1: 40% of total production

  • Plant 2: 30% of total production

  • Plant 3: 30% of total production

  • Plant 1: 10% of items produced are faulty

  • Plant 2: 5% of items produced are faulty

  • Plant 3: 20% of items produced are faulty

Given a defective item, what is the probability that it came from Plant 2. That is, what is \(P(E_j|D),\) where \(E_j\) denotes the event “the item comes from the plant j” and \(D\) denotes that it is defective?

Note: The \(\bigcup_{i=1}^3 = \Omega =\) sample space including plants 1,2, and 3 (the 3 possible factories from which the item may have been produced at).

\[ P(E_i) > 0, \forall i =1,2,3 \\ P(E_1) = 0.4 \\ P(E_2) = 0.3 \\ P(E_3) = 0.3\] \(\implies P(E_1|D) = \frac{P(D|E_1) \cdot P(E_1)}{\sum_{i=1}^3 P(D|E_i) \cdot P(E_i)} = \frac{0.1 \cdot 0.4}{0.1 \cdot 0.4 + 0.05 \cdot 0.3 + 0.2 \cdot 0.3} = 34.78 \%\)

4.2.3 Statistical (or stochastic) independence

Statistical independence refers to the absence of a relationship between two events or variables. In econometrics, this concept is often used when analyzing data to determine whether a relationship exists between two variables. For example, an economist may want to determine whether there is a relationship between a company’s level of advertising expenditure and its sales. If the economist determines that these variables are statistically independent, it suggests that the level of advertising expenditure does not have an effect on sales. On the other hand, if the economist determines that the variables are not statistically independent, it suggests that there is a relationship between advertising expenditure and sales.

Definition 4.3 For two events \(A,B\). We say that \(A\) and \(B\) are statistically (or stochastically) independent if and only if \(P(A \cap B) = P(A) \cdot P(B)\)

Proposition 4.3 The following three conditions are equivalent for statistical independence.

  • If \(A,B\) are statistically independent events and \(P(A) > 0)\), then \(P(B|A) = P(B)\).
  • If \(P(B|A) = P(B)\), then \(A\) and \(B\) are statistically independent.
  • If \(A,B\) are statistically independent events and \(P(B) > 0)\), then \(P(A|B) = P(A)\)

In addition, if \(A\) and \(B\) are independent we know: \(\overline{A}\) and \(\overline{B}\); \(\overline{A}\) and \(B\); and \(A\) and \(\overline{B}\) are all independent.

Example 4.8 You are given a box with 10 balls (4R + 6Y). I randomly draw 2 balls with replacement. Let

\[ A = \text{"first ball is Y"} \\ B = \text{"second ball is Y"}\] It follows,

\[\begin{align} P(A) &= 0.6 \\ P(B) &= P(B|A) \cdot P(A) + P(B|\overline{A}) \cdot P(\overline{A}) \text{(total probability theorem)} \\ &= 0.6 \cdot 0.6 + 0.6 \cdot 0.4 = 0.6 \end{align}\]

Note: \(P(B|A) = P(B| \overline{A}) = 0.6\).

Now, what if I randomly draw 2 balls without replacement?

\[\begin{align} P(A) &= 0.6 \\ P(B) = \frac{5}{9} \cdot 0.6 + \frac{2}{3} \cdot 0.4 = 0.6 \text{(total probability theorem)} \end{align}\]

Note: \(P(B|A) = \frac{5}{9}\) and \(P(B|\overline{A}) = \frac{6}{9} = \frac{2}{3}\)

When we sampled with replacement, my second draw is independent from what I draw first since I replace the ball after the computation. That is, \(A\) and \(B\) are statistically independent in the first case of the example. Nevertheless, if I don’t replace the ball I draw first then I am affecting the probability of the second ball’s color (by reducing the sample space of whatever color I drew first). That is, \(A\) and \(B\) are not statistically independent in the second case of the example.

4.2.4 Random Variables

A random variable is a variable that can take on different values randomly, according to a probability distribution. In econometrics, random variables are often used to model uncertain or unpredictable quantities, such as future sales or stock prices. There are two types of random variables: discrete and continuous. A discrete random variable can take on a finite or countably infinite number of values, such as the number of heads that appear when flipping a coin 10 times. A continuous random variable can take on any value within a given range, such as the weight of a bag of flour. The probability distribution of a random variable specifies the probability of each possible value occurring. In econometrics, random variables are often used to make probabilistic predictions about uncertain quantities and to analyze the relationship between variables.