1.2 Facts about distributions
We will make use of certain parametric distributions. Some notation and facts are introduced as follows.
1.2.1 Normal distribution
The normal distribution with mean μ and variance σ2 is denoted by N(μ,σ2). Its pdf is ϕσ(x−μ):=1√2πσe−(x−μ)22σ2, x∈R, and satisfies that ϕσ(x−μ)=1σϕ(x−μσ) (if σ=1 the dependence on σ is omitted). Its cdf is denoted by Φσ(x−μ). The upper α-quantile of a N(0,1) is denoted by zα, so it satisfies that zα=Φ−1(1−α).6 The shortest interval that contains 1−α probability of a X∼N(μ,σ2) is (μ−zα/2σ,μ+zα/2σ), i.e., P[X∈(μ±zα/2σ)]=1−α. Some uncentered moments of X∼N(μ,σ2) are
E[X]=μ,E[X2]=μ2+σ2,E[X3]=μ3+3μσ2,E[X4]=μ4+6μ2σ2+3σ4.
Remark. It is interesting to compare the length of (μ±zα/2σ) for α=1/t2 with the one in (1.4), as this gives direct insight into how larger the Chebyshev confidence interval (1.4) is when X∼N(μ,σ2). The table below gives the length increment factor t/z(0.5/t2) of the Chebyshev confidence interval.
t | 2 | 3 | 4 | 5 | 6 |
---|---|---|---|---|---|
Guaranteed coverage | 0.75 | 0.8889 | 0.9375 | 0.96 | 0.9722 |
Increment factor | 1.7386 | 1.883 | 2.1474 | 2.4346 | 2.7268 |
Balancing between the guaranteed coverage and increment factor, it seems reasonable to define the “3σ-rule” for any random variable as: “almost 90% of the values of a random variable X lie on (μ−3σ,μ+3σ), if E[X]=μ and Var[X]=σ2<∞”.
The multivariate normal is represented by \mathcal{N}_p(\boldsymbol{\mu},\boldsymbol{\Sigma}), where \boldsymbol{\mu} is a p-vector and \boldsymbol{\Sigma} is a p\times p symmetric and positive matrix. The pdf of a \mathcal{N}(\boldsymbol{\mu},\boldsymbol{\Sigma}) is \phi_{\boldsymbol{\Sigma}}(\mathbf{x}-\boldsymbol{\mu}):=\frac{1}{(2\pi)^{p/2}|\boldsymbol{\Sigma}|^{1/2}}e^{-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})'\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})} and satisfies that \phi_{\boldsymbol{\Sigma}}(\mathbf{x}-\boldsymbol{\mu})=|\boldsymbol{\Sigma}|^{-1/2}\phi\big(\boldsymbol{\Sigma}^{-1/2}(\mathbf{x}-\boldsymbol{\mu})\big) (if \boldsymbol{\Sigma}=\mathbf{I}, the dependence on \boldsymbol{\Sigma} is omitted). The multivariate normal has an appealing linear property that stems from (1.2) and (1.3):
\begin{align} \mathbf{A}\mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma)+\mathbf{b}\stackrel{d}{=}\mathcal{N}_q(\mathbf{A}\boldsymbol\mu+\mathbf{b},\mathbf{A}\boldsymbol\Sigma\mathbf{A}').\tag{1.5} \end{align}
Exercise 1.9 The pdf of a bivariate normal (p=2, see Figure 1.1) can be also expressed as
\begin{align} &\phi(x_1,x_2;\mu_1,\mu_2,\sigma_1^2,\sigma_2^2,\rho):=\frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}\tag{1.6}\\ &\,\times\!\exp\left\{\!-\frac{1}{2(1-\rho^2)}\!\left[\frac{(x_1-\mu_1)^2}{\sigma_1^2}+\frac{(x_2-\mu_2)^2}{\sigma_2^2}-\frac{2\rho(x_1-\mu_1)(x_2-\mu_2)}{\sigma_1\sigma_2}\right]\!\right\}\!,\nonumber \end{align}
where \mu_1,\mu_2\in\mathbb{R}, \sigma_1,\sigma_2>0, and -1<\rho<1. The parametrization uses \boldsymbol{\mu}=(\mu_1,\mu_2)' and \boldsymbol{\Sigma}=(\sigma_1^2,\rho\sigma_1\sigma_2;\rho\sigma_1\sigma_2,\sigma_2^2).7
- Derive the pdf of X_1: \phi(x_1;\mu_1,\sigma_1^2).
- Derive the pdf of X_1|X_2=x_2: \phi\big(x_1;\mu_1+\rho\frac{\sigma_1}{\sigma_2}(x_2-\mu_2),(1-\rho^2)\sigma_1^2\big).
- Derive \mathbb{E}[X_1|X_2=x_2] and \mathbb{V}\mathrm{ar}[X_1|X_2=x_2].
1.2.2 Other distributions
The lognormal distribution is denoted by \mathcal{LN}(\mu,\sigma^2) and is such that \mathcal{LN}(\mu,\sigma^2)\stackrel{d}{=}\exp(\mathcal{N}(\mu,\sigma^2)). Its pdf is f(x;\mu,\sigma)=\frac{1}{x}\phi_\sigma(\log x-\log\mu)=\frac{1}{\sqrt{2\pi}\sigma x}e^{-\frac{(\log x-\log\mu)^2}{2\sigma^2}}, x>0. Note that \mathbb{E}[\mathcal{LN}(\mu,\sigma^2)]=e^{\mu+\frac{\sigma^2}{2}} and \mathbb{V}\mathrm{ar}[\mathcal{LN}(\mu,\sigma^2)]=\big(e^{\sigma^2}-1\big)e^{2\mu+\sigma^2}.
The exponential distribution is denoted by \mathrm{Exp}(\lambda) and has pdf f(x;\lambda)=\lambda e^{-\lambda x}, \lambda,x>0.
The gamma distribution is denoted by \Gamma(a,p) and has pdf f(x;a,p)=\frac{a^p}{\Gamma(p)} x^{p-1}e^{-a x}, a,p,x>0, where \Gamma(p)=\int_0^\infty x^{p-1}e^{-ax}\,\mathrm{d}x. The parameter a is the rate and p is the shape. It is known that \mathbb{E}[\Gamma(a,p)]=\frac{p}{a} and \mathbb{V}\mathrm{ar}[\Gamma(a,p)]=\frac{p}{a^2}.
The inverse gamma distribution, \mathrm{IG}(a,p)\stackrel{d}{=}\Gamma(a,p)^{-1}, has pdf f(x;a,p)=\frac{a^p}{\Gamma(p)} x^{-p-1}e^{-\frac{a}{x}}, a,p,x>0. It is known that \mathbb{E}[\mathrm{IG}(a,p)]=\frac{a}{p-1} and \mathbb{V}\mathrm{ar}[\mathrm{IG}(a,p)]=\frac{a^2}{(p-1)^2(p-2)}.
The binomial distribution is denoted by \mathrm{B}(n,p). Recall that \mathbb{E}[\mathrm{B}(n,p)]=np and \mathbb{V}\mathrm{ar}[\mathrm{B}(n,p)]=np(1-p). A \mathrm{B}(1,p) is a Bernoulli distribution, denoted by \mathrm{Ber}(p).
The beta distribution is denoted by \beta(a,b) and its pdf is f(x;a,b)=\frac{1}{\beta(a,b)}x^{a-1}(1-x)^{1-b}, 0<x<1, where \beta(a,b)=\frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}. When a=b=1, the uniform distribution \mathcal{U}(0,1) arises.
The Poisson distribution is denoted by \mathrm{Pois}(\lambda) and has probability mass function \mathbb{P}[X=x]=\frac{\lambda^x e^{-\lambda}}{x!}, x=0,1,2,\ldots Recall that \mathbb{E}[\mathrm{Pois}(\lambda)]=\mathbb{V}\mathrm{ar}[\mathrm{Pois}(\lambda)]=\lambda.