1.3 History of probability

1.3.1 Experiment, Sample space and Events

Experiment

An experiment refers to a process or procedure that generates an outcome (result). In general, under the same conditions, the results of each experiment are consistent, with at most a few measurement errors.

However, in the context of probability and statistics, this outcome is often uncertain or random, say random experiment. A random experiment is a specific type of experiment in which the outcome is not predictable with certainty. This means that, even if the experiment is performed under the same conditions, the outcome may vary. The purpose of studying experiments is to analyze the probability of different outcomes.

For more precisely, experiments possessing the following 3 characteristics are referred to as random experiments:

  1. It is possible to completely list and describe all possible outcomes before conducting the experiment.
  2. The result of the experiment cannot be predicted in advance among the numerous possible outcomes.
  3. The experiment can be repeated under identical conditions.

Sample space

The sample space, represented by \(\Omega\) or \(\mathcal{S}\), is the set of all possible outcomes in a random experiment. Each element in the sample space is called a sample point. The sample space can be divided into two types based on the number of sample points: discrete sample space, where the number of sample points is finite or countable, and continuous sample space, where the number of sample points is uncountable.

countable means that a set \(A\) exist an 1-1 mapping onto \(\mathbb{N}\), which can be learned in advanced calculus.

Events

A subset of a sample space is called an event \(E\), denoted by \(E \subset \Omega\). If \(E,F\) are two events in a sample space \(\Omega\) and \(E \cap F = \varnothing\), then we say two events \(E,F\) are disjoint (or mutually exclusive).

Example:

Tossing two fair coins is a random experiment because the outcome (heads \(H\) or tails \(T\)) is uncertain and not predictable with certainty. The sample space is \(\Omega=\{HH,HT,TH,TT\}\). \(HT \in \Omega\) is an element in the sample space. An event where one outcome is heads and the other is tails is \(E=\{HT,TH\}\). Let \(F\) be another event where two coins have the same side. So \(F=\{HH,TT\}\) and \(A,B\) are disjoint.


1.3.2 Definitions of Probability

There are several definitions of probability, mainly including classical probability, geometric probability, relative frequency probability, and subjective probability. These are different approaches or interpretations of probability, each with its own characteristics and applications.

Classical probability

Suppose the sample space \(\Omega\) is finite, define the probability \(P(E)\) of an event \(E\) is \[ P(E)=\frac{N(E)}{N(\Omega)} \] where \(N(\cdot)\) represent the number of elements (sample points) in a set (event).

Classical probability is based on the assumption that all outcomes in the sample space are equally likely. It is applicable to situations where each outcome is equally likely to occur.

Example:

Tossing two fair coins. The probability that one outcome is heads and the other is tails is \(P(E)=\frac{N(E)}{N(\Omega)}=\frac{2}{4}=0.5\)

Geometric probability

Geometric probability is used in situations where the sample space is geometrically defined, and the probability is determined by measuring geometric properties.

Example:

If you throw a dart at a square target and want to find the probability of hitting a specific region within the square, the ratio of the area of that region to the total area of the square represents the geometric probability.

Relative frequency probability

Relative frequency probability is based on observed frequencies in the real world. It involves conducting experiments and calculating the probability of an event based on the frequency of its occurrence in those experiments.

Suppose the sample space is \(\Omega\), an experiment is repeatedly performed under exactly the same conditions. Let \(n\) be the number of repetitions of the experiment, and \(n_E\) be the number of times that the event \(E\) occurs. Then the probability \(P(E)\) of the event \(E\), is \[ P(E) = \lim_{n \to \infty} \frac{n_E}{n} \]

Example:

Conducting multiple trials of flipping a coin and determining the probability of getting heads based on the observed frequency.

The advantage of relative frequency probability is that the sample point in the sample space can be infinite (even if it is uncountable). Also, we don’t have to assume that all outcomes in the sample space are equally likely. However, we can’t repeat the experiment infinitely many times in the real world, and the limit may not converge. Most importantly, we can’t apply relative frequency probability to a case that can’t be repeated.

Note that relative frequency probability is also known as objective probability.

Subjective probability

Define the probability of the event \(E\), \(P(E) \in [0,1]\). Subjective probability is based on an individual’s personal judgment or belief about the likelihood of an event occurring. It is often used in situations where there is uncertainty, and probabilities are assigned subjectively.

Subjective probability can be applied to a case that can’t be repeated, but it’s really “subjective”. Given the same information and under the same conditions, two people can have different subjective probabilities. This is why subjective probability is also called personal probability.

Example:

Predicting the likelihood of rain tomorrow based on personal experience and intuition. Different individuals may assign different probabilities to the same event based on their subjective beliefs.

Remark: we also called classical probability by priori probability, and called relative frequency probability by posteriori probability. This is because classical probabilities are determined or predicted before any actual experimentation or observation; relative frequency probabilities are determined or estimated based on empirical evidence, observations, or data obtained through experimentation.

Each type of probability has its strengths and weaknesses, and the choice of which to use depends on the nature of the problem and the available information.

Axioms of probability

Axiom 1 : (non-negative) For any event \(E \subset \Omega\), \(P(E) \geq 0\).

Axiom 2 : (normed) \(P(\Omega)=1\).

Axiom 3 : (linearly, countable additivity) For any sequence of mutually exclusive events \(E_1,E_2,\cdots\) in a sample space \(\Omega\), i.e. \(E_i \cap E_j = \varnothing, \forall i \neq j\). We have \(\displaystyle P \left(\bigcup_{i=1}^\infty E_i \right) = \sum_{i=1}^\infty P(E_i)\).

satisfy -> prob. space


Proposition:

  1. \(P(A^c) = 1 - P(A)\)

  2. \(P(A) \leq 1\)

  3. \(P(\varnothing)=0, \quad P(\Omega)=1\)

  4. If \(A \subseteq B\), then \(P(A) \leq P(B)\).

  5. \(P(A \cup B) = P(A) + P(B) - P(A \cap B)\). If \(A,B\) are disjoint, then \(P(A \cup B) = P(A) + P(B)\) (addition rule)

  6. The law of total probability \(P(A)=P(A \cap B)+P(A \cap B^c)\)

  7. Boole’s inequality \(P(A \cap B) \leq P(A) + P(B)\)

  8. Bonferroni’s inequality \(P(A \cap B) \geq P(A)+P(B)-1\)