1.3 Families of Distributions
Above, we introduced the basic concepts of random variables and probability distributions. Now, let’s explore some common distributions.
1.3.1 Discrete probability distributions
Bernoulli distribution
Definition 1.1 A Bernoulli trial is a random experiment with exactly two possible outcomes: “success” and “failure.” These outcomes are typically denoted as 1 for success and 0 for failure. The probability of success is often denoted by p, and the probability of failure is then given by q=1−p.
The Bernoulli distribution is a discrete probability distribution that describes the probability of getting exactly successes in a Bernoulli trials with the probability of success p∈(0,1).
Denote X∼Ber(p) if a random variable X follows Bernoulli distribution. The probability mass function of the Bernoulli distribution is given by: pX(x)={p,x=11−p,x=00,otherwise=px(1−p)1−x,x=0,1 and the range RX={0,1}.
Proposition 1.1 E(X)=p,Var(X)=p(1−p)
Binomial distribution
The Binomial distribution describes the probability of getting exactly k successes in n Bernoulli trials with the probability of success p∈(0,1).
Denote X∼Bin(n,p) if a random variable X follows Binomial distribution. The probability mass function is given by: p_X(x) = {n \choose x} p^x(1-p)^{n-x}, \quad x=0,1,\cdots,n. and the range R_X=\{0,1,\cdots,n\}.
Proposition 1.2
- \mathbb{E}(X)=np, \quad \mathrm{Var}(X)=np(1-p)
- \text{Ber}(p) \sim \text{Bin}(1,p)
- (additive property) Let X \sim \text{Bin}(n_1,p), Y \sim Bin(n_2,p) and X \perp\!\!\!\perp Y. Then X+Y \sim \text{Bin}(n_1+n_2,p)
Poisson distribution
Before we introduce the Poisson distribution, we need to know what is Poisson process.
Definition:
A Poisson process is a stochastic process that models a sequence of events occurring randomly over time or space. It is characterized by the following properties:
- The probability that exactly 1 event occurs in a given interval of length h is equal to \lambda h + o(h).
- The probability that 2 or more events occur in an interval of length h is equal to o(h).
- For any integers n, j_1, j_2, \cdots, j_n and any set of n non-overlapping intervals, if we define E_i to be the event that exactly j_i of the events under consideration occur in the i-th of these intervals, then events E_1,E_2,\cdots,E_n are independent.
Little o notation: o(h) stands for any function f(h) for which \displaystyle \lim_{h \to 0} \frac{f(h)}{h} = 0.
p_X(k) = \frac{\lambda^{k} e^{-\lambda}}{k!}, \quad k \in \mathbb{N}, \lambda \in (0,\infty)
Zero-inflated Poisson distribution, Zero-truncated Poisson distribution, Conway–Maxwell–Poisson distribution, Skellam distribution
1.3.2 Continuous probability distributions
Uniform distribution (discrete and continuous)
f(k) = \begin{cases} \frac{1}{b-a+1}, & \text{for } a \leq k \leq b, \quad k \in \mathbb{N} \\ 0, & \text{otherwise} \end{cases} where a and b are integers.
f(x) = \begin{cases} \frac{1}{b-a}, & \text{for } a \leq x \leq b \\ 0, & \text{otherwise} \end{cases} where a and b are real numbers.
(log) Normal distribution
f(x) = \frac{1}{\sqrt{2\pi\sigma^2}} \exp\left(-\frac{(x-\mu)^2}{2\sigma^2}\right)
where \mu is the mean and \sigma^2 is the variance.
f(x) = \frac{1}{x\sigma\sqrt{2\pi}} \exp\left(-\frac{(\ln x - \mu)^2}{2\sigma^2}\right)
(shifted/double) Exponential distribution
f(x) = \begin{cases} \lambda e^{-\lambda x} & \text{for } x \geq 0 \\ 0 & \text{otherwise} \end{cases}
where \lambda is the rate parameter.
f(x) = \begin{cases} \lambda e^{-\lambda (x-\mu)} & \text{for } x \geq \mu \\ 0 & \text{otherwise} \end{cases}
where \lambda is the rate parameter and \mu is the shift parameter.
Double Exponential Distribution (Laplace Distribution)
f(x) = \frac{\lambda}{2} e^{(-\lambda |x-\mu|)}
where \lambda is the rate parameter and \mu is the location parameter.
Gamma distribution (Erlang distribution)
f(x) = \frac{\lambda^k}{\Gamma(k)} x^{k-1} e^{-\lambda x} where k is the shape parameter and \lambda is the rate parameter.
Erlang Distribution
f(x) = \frac{\lambda^k}{(k-1)!} x^{k-1} e^{-\lambda x}
where k is an integer (number of events) and \lambda is the rate parameter.
Beta distribution
f(x) = \frac{1}{B(\alpha, \beta)} x^{\alpha-1} (1-x)^{\beta-1}
where \alpha and \beta are shape parameters and B(\alpha,\beta) is the beta function.
Weibull distribution (Rayleigh distribution)
f(x) = \frac{k}{\lambda} \left(\frac{x}{\lambda}\right)^{k-1} \exp\left(-\left(\frac{x}{\lambda}\right)^k\right)
where k is the shape parameter and \lambda is the scale parameter.
Rayleigh Distribution
f(x) = \frac{x}{\sigma^2} \exp\left(-\frac{x^2}{2\sigma^2}\right)
where \sigma is the scale parameter.
Pareto distribution
f(x) = \begin{cases} \frac{\alpha x_m^\alpha}{x^{\alpha+1}} & \text{, for } x \geq x_m \\ 0 & \text{, otherwise} \end{cases}
where \alpha is the shape parameter and x_m is the scale parameter.
1.3.3 Distributions derive from Normal
Students’ t-distribution
f(t) = \frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu\pi} \, \Gamma\left(\frac{\nu}{2}\right)} \left(1 + \frac{t^2}{\nu}\right)^{-\frac{\nu+1}{2}}
where t is the test statistic, \nu is the degrees of freedom (df.).
(Doubly) non-central t distribution ? The PDF of the non-central t-distribution is more complex and generally expressed in terms of hypergeometric functions.
\chi^2 distribution
f(x) = \frac{1}{2^{\nu/2} \Gamma(\nu/2)} x^{\nu/2 - 1} e^{-x/2}
where x is the chi-squared variable, \nu is the degrees of freedom.
non-central \chi^2
f(x) = \frac{1}{2} \left( \frac{x}{\lambda} \right)^{\frac{\nu}{4} - \frac{1}{2}} e^{-\frac{1}{2}(x+\lambda)} I_{\frac{\nu}{2}-1}(\sqrt{\lambda x})
where x is the non-central chi-squared variable, \nu is the degrees of freedom, \lambda is the non-centrality parameter, I_{\nu}(x) is the first kind modified Bessel function.
F distribution
f(x) = \frac{\Gamma\left(\frac{\nu_1 + \nu_2}{2}\right)}{\Gamma\left(\frac{\nu_1}{2}\right) \Gamma\left(\frac{\nu_2}{2}\right)} \left(\frac{\nu_1}{\nu_2}\right)^{\frac{\nu_1}{2}} x^{\frac{\nu_1}{2} - 1} \left(1 + \frac{\nu_1}{\nu_2} x\right)^{-\frac{\nu_1 + \nu_2}{2}}
where x is the F statistic, \nu_1 is the degrees of freedom for the numerator, \nu_2 is the degrees of freedom for the denominator.
(Doubly) non-central F distribution ?
Non-central F-distribution
The PDF of the non-central F-distribution is also quite complex and involves hypergeometric functions.
Doubly Non-central F-distribution
The doubly non-central F-distribution has two non-centrality parameters and its PDF is even more intricate.