1.2 Facts about distributions

We will make use of certain parametric distributions. Some notation and facts are introduced as follows.

1.2.1 Normal distribution

The normal distribution with mean $\mu$ and variance $\sigma^2$ is denoted by $\mathcal{N}(\mu,\sigma^2).$ Its pdf is $\phi_\sigma(x-\mu):=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}},$ $x\in\mathbb{R},$ and satisfies that $\phi_\sigma(x-\mu)=\frac{1}{\sigma}\phi\left(\frac{x-\mu}{\sigma}\right)$ (if $\sigma=1$ the dependence on $\sigma$ is omitted). Its cdf is denoted by $\Phi_\sigma(x-\mu).$ The upper $\alpha$ -quantile of a $\mathcal{N}(0,1)$ is denoted by $z_\alpha,$ so it satisfies that $z_\alpha=\Phi^{-1}(1-\alpha).$ ⁶ The shortest interval that contains $1-\alpha$ probability of a $X\sim\mathcal{N}(\mu,\sigma^2)$ is $(\mu-z_{\alpha/2}\sigma,\mu+z_{\alpha/2}\sigma),$ i.e., $\mathbb{P}[X\in(\mu\pm z_{\alpha/2}\sigma)]=1-\alpha.$ Some uncentered moments of $X\sim\mathcal{N}(\mu,\sigma^2)$ are

$\begin{align*} \mathbb{E}[X]&=\mu,\\ \mathbb{E}[X^2]&=\mu^2+\sigma^2,\\ \mathbb{E}[X^3]&=\mu^3+3\mu\sigma^2,\\ \mathbb{E}[X^4]&=\mu^4+6\mu^2\sigma^2+3\sigma^4. \end{align*}$

Remark. It is interesting to compare the length of $(\mu\pm z_{\alpha/2}\sigma)$ for $\alpha=1/t^2$ with the one in (1.4), as this gives direct insight into how larger the Chebyshev confidence interval (1.4) is when $X\sim\mathcal{N}(\mu,\sigma^2).$ The table below gives the length increment factor $t/z_{(0.5/t^2)}$ of the Chebyshev confidence interval.

$t$	$2$	$3$	$4$	$5$	$6$
Guaranteed coverage	$0.75$	$0.8889$	$0.9375$	$0.96$	$0.9722$
Increment factor	$1.7386$	$1.883$	$2.1474$	$2.4346$	$2.7268$

Balancing between the guaranteed coverage and increment factor, it seems reasonable to define the “ $3\sigma$ -rule” for any random variable as: “almost $90\%$ of the values of a random variable $X$ lie on $(\mu-3\sigma,\mu+3\sigma),$ if $\mathbb{E}[X]=\mu$ and $\mathbb{V}\mathrm{ar}[X]=\sigma^2<\infty$ ”.

The multivariate normal is represented by $\mathcal{N}_p(\boldsymbol{\mu},\boldsymbol{\Sigma}),$ where $\boldsymbol{\mu}$ is a $p$ -vector and $\boldsymbol{\Sigma}$ is a $p\times p$ symmetric and positive matrix. The pdf of a $\mathcal{N}(\boldsymbol{\mu},\boldsymbol{\Sigma})$ is $\phi_{\boldsymbol{\Sigma}}(\mathbf{x}-\boldsymbol{\mu}):=\frac{1}{(2\pi)^{p/2}|\boldsymbol{\Sigma}|^{1/2}}e^{-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})'\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})}$ and satisfies that $\phi_{\boldsymbol{\Sigma}}(\mathbf{x}-\boldsymbol{\mu})=|\boldsymbol{\Sigma}|^{-1/2}\phi\big(\boldsymbol{\Sigma}^{-1/2}(\mathbf{x}-\boldsymbol{\mu})\big)$ (if $\boldsymbol{\Sigma}=\mathbf{I},$ the dependence on $\boldsymbol{\Sigma}$ is omitted). The multivariate normal has an appealing linear property that stems from (1.2) and (1.3):

$\begin{align} \mathbf{A}\mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma)+\mathbf{b}\stackrel{d}{=}\mathcal{N}_q(\mathbf{A}\boldsymbol\mu+\mathbf{b},\mathbf{A}\boldsymbol\Sigma\mathbf{A}').\tag{1.5} \end{align}$

Exercise 1.9 The pdf of a bivariate normal ( $p=2,$ see Figure 1.1) can be also expressed as

$\begin{align} &\phi(x_1,x_2;\mu_1,\mu_2,\sigma_1^2,\sigma_2^2,\rho):=\frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}\tag{1.6}\\ &\,\times\!\exp\left\{\!-\frac{1}{2(1-\rho^2)}\!\left[\frac{(x_1-\mu_1)^2}{\sigma_1^2}+\frac{(x_2-\mu_2)^2}{\sigma_2^2}-\frac{2\rho(x_1-\mu_1)(x_2-\mu_2)}{\sigma_1\sigma_2}\right]\!\right\}\!,\nonumber \end{align}$

where $\mu_1,\mu_2\in\mathbb{R},$ $\sigma_1,\sigma_2>0,$ and $-1<\rho<1.$ The parametrization uses $\boldsymbol{\mu}=(\mu_1,\mu_2)'$ and $\boldsymbol{\Sigma}=(\sigma_1^2,\rho\sigma_1\sigma_2;\rho\sigma_1\sigma_2,\sigma_2^2).$ ⁷

Derive the pdf of $X_1$ : $\phi(x_1;\mu_1,\sigma_1^2).$
Derive the pdf of $X_1|X_2=x_2$ : $\phi\big(x_1;\mu_1+\rho\frac{\sigma_1}{\sigma_2}(x_2-\mu_2),(1-\rho^2)\sigma_1^2\big).$
Derive $\mathbb{E}[X_1|X_2=x_2]$ and $\mathbb{V}\mathrm{ar}[X_1|X_2=x_2].$

1.2.2 Other distributions

The lognormal distribution is denoted by $\mathcal{LN}(\mu,\sigma^2)$ and is such that $\mathcal{LN}(\mu,\sigma^2)\stackrel{d}{=}\exp(\mathcal{N}(\mu,\sigma^2)).$ Its pdf is $f(x;\mu,\sigma)=\frac{1}{x}\phi_\sigma(\log x-\log\mu)=\frac{1}{\sqrt{2\pi}\sigma x}e^{-\frac{(\log x-\log\mu)^2}{2\sigma^2}},$ $x>0.$ Note that $\mathbb{E}[\mathcal{LN}(\mu,\sigma^2)]=e^{\mu+\frac{\sigma^2}{2}}$ and $\mathbb{V}\mathrm{ar}[\mathcal{LN}(\mu,\sigma^2)]=\big(e^{\sigma^2}-1\big)e^{2\mu+\sigma^2}.$
The exponential distribution is denoted by $\mathrm{Exp}(\lambda)$ and has pdf $f(x;\lambda)=\lambda e^{-\lambda x},$ $\lambda,x>0.$
The gamma distribution is denoted by $\Gamma(a,p)$ and has pdf $f(x;a,p)=\frac{a^p}{\Gamma(p)} x^{p-1}e^{-a x},$ $a,p,x>0,$ where $\Gamma(p)=\int_0^\infty x^{p-1}e^{-ax}\,\mathrm{d}x.$ The parameter $a$ is the rate and $p$ is the shape. It is known that $\mathbb{E}[\Gamma(a,p)]=\frac{p}{a}$ and $\mathbb{V}\mathrm{ar}[\Gamma(a,p)]=\frac{p}{a^2}.$
The inverse gamma distribution, $\mathrm{IG}(a,p)\stackrel{d}{=}\Gamma(a,p)^{-1},$ has pdf $f(x;a,p)=\frac{a^p}{\Gamma(p)} x^{-p-1}e^{-\frac{a}{x}},$ $a,p,x>0.$ It is known that $\mathbb{E}[\mathrm{IG}(a,p)]=\frac{a}{p-1}$ and $\mathbb{V}\mathrm{ar}[\mathrm{IG}(a,p)]=\frac{a^2}{(p-1)^2(p-2)}.$
The binomial distribution is denoted by $\mathrm{B}(n,p).$ Recall that $\mathbb{E}[\mathrm{B}(n,p)]=np$ and $\mathbb{V}\mathrm{ar}[\mathrm{B}(n,p)]=np(1-p).$ A $\mathrm{B}(1,p)$ is a Bernoulli distribution, denoted by $\mathrm{Ber}(p).$
The beta distribution is denoted by $\beta(a,b)$ and its pdf is $f(x;a,b)=\frac{1}{\beta(a,b)}x^{a-1}(1-x)^{1-b},$ $0<x<1,$ where $\beta(a,b)=\frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}.$ When $a=b=1,$ the uniform distribution $\mathcal{U}(0,1)$ arises.
The Poisson distribution is denoted by $\mathrm{Pois}(\lambda)$ and has probability mass function $\mathbb{P}[X=x]=\frac{\lambda^x e^{-\lambda}}{x!},$ $x=0,1,2,\ldots$ Recall that $\mathbb{E}[\mathrm{Pois}(\lambda)]=\mathbb{V}\mathrm{ar}[\mathrm{Pois}(\lambda)]=\lambda.$

A particular useful value for computing confidence intervals is $z_{0.05/2}=z_{0.025}\approx 1.96\approx 2.$ ↩︎
Note that this is an immediate parametrization of a $2\times2$ covariance matrix. The parametrization becomes cumbersome when $p>2.$ ↩︎