Chapter 4 Distributions
Random variables can potentially take many different values, usually with some values or intervals of values more likely than others. We have used simulation to investigate the pattern of variability of random variables. Plots and summary statistics like the ones encountered in the previous chapters summarize distributions of random variables.
The (probability) distribution of a random variable specifies the possible values of the random variable and a way of determining corresponding probabilities. We will see several ways to describe distributions, some of which depend on the number and types of the random variables under investigation.
Commonly encountered random variables can be classified as discrete or continuous (or a mixture of the two98).
- A discrete random variable can take on only countably many isolated points on a number line. These are often counting type variables. Note that “countably many” includes the case of countably infinite, such as \(\{0, 1, 2, \ldots\}\).
- A continuous random variable can take any value within some uncountable interval, such as \([0, 1]\), \([0,\infty)\), or \((-\infty, \infty)\). These are often measurement type variables.
We will see a few ways of specifying a distribution.
- A well labeled plot
- A table of possible values and corresponding probabilities for discrete random variables. This could be a two-way table for the joint distribution of two discrete random variables.
- A probability mass function for discrete random variables or a probability density function for continuous random variables which maps possible values \(x\) — or \((x, y)\) pairs, etc — to their respective probability (for discrete) or density (for continuous).
- A cumulative distribution function, which provides all the percentiles of the distribution
- By name, including values of relevant parameters, e.g., “Exponential(1)”, “Normal(500, 100)”, “Binomial(5, 0.3)”, “BivariateNormal(500, 500, 100, 100, 0.8)”. Some probabilistic situations are so common that the corresponding distributions have special names. Always be sure to specify values of relevant parameters, e.g., “Normal(500, 100) distribution” rather than just “Normal distribution”. Note that different named distributions have different identifying parameters. For example, the parameters 0 and 1 mean something different for the Uniform(0, 1) distribution than for the Normal(0, 1) distribution.
There is another type of weird random variable which has a “singular” distribution, like the Cantor distribution, but we’re counting these random variables as not commonly encountered.↩︎