1.2 Facts about distributions

We will make use of certain parametric distributions. Some notation and facts are introduced as follows.

1.2.1 Normal distribution

The normal distribution with mean \(\mu\) and variance \(\sigma^2\) is denoted by \(\mathcal{N}(\mu,\sigma^2).\) Its pdf is \(\phi_\sigma(x-\mu):=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}},\) \(x\in\mathbb{R},\) and satisfies that \(\phi_\sigma(x-\mu)=\frac{1}{\sigma}\phi\left(\frac{x-\mu}{\sigma}\right)\) (if \(\sigma=1\) the dependence on \(\sigma\) is omitted). Its cdf is denoted by \(\Phi_\sigma(x-\mu).\) The upper \(\alpha\)-quantile of a \(\mathcal{N}(0,1)\) is denoted by \(z_\alpha,\) so it satisfies that \(z_\alpha=\Phi^{-1}(1-\alpha).\)5 The shortest interval that contains \(1-\alpha\) probability of a \(X\sim\mathcal{N}(\mu,\sigma^2)\) is \((\mu-z_{\alpha/2}\sigma,\mu+z_{\alpha/2}\sigma),\) i.e., \(\mathbb{P}[X\in(\mu\pm z_{\alpha/2}\sigma)]=1-\alpha.\) Some uncentered moments of \(X\sim\mathcal{N}(\mu,\sigma^2)\) are

\[\begin{align*} \mathbb{E}[X]&=\mu,\\ \mathbb{E}[X^2]&=\mu^2+\sigma^2,\\ \mathbb{E}[X^3]&=\mu^3+3\mu\sigma^2,\\ \mathbb{E}[X^4]&=\mu^4+6\mu^2\sigma^2+3\sigma^4. \end{align*}\]

Remark. It is interesting to compare the length of \((\mu\pm z_{\alpha/2}\sigma)\) for \(\alpha=1/t^2\) with the one in (1.4), as this gives direct insight into how larger the Chebyshev confidence interval (1.4) is when \(X\sim\mathcal{N}(\mu,\sigma^2).\) The table below gives the length increment factor \(t/z_{(0.5/t^2)}\) of the Chebyshev confidence interval.

\(t\) \(2\) \(3\) \(4\) \(5\) \(6\)
Guaranteed coverage \(0.75\) \(0.8889\) \(0.9375\) \(0.96\) \(0.9722\)
Increment factor \(1.7386\) \(1.883\) \(2.1474\) \(2.4346\) \(2.7268\)

Balancing between the guaranteed coverage and increment factor, it seems reasonable to define the “\(3\sigma\)-rule” for any random variable as: “almost \(90\%\) of the values of a random variable \(X\) lie on \((\mu-3\sigma,\mu+3\sigma),\) if \(\mathbb{E}[X]=\mu\) and \(\mathbb{V}\mathrm{ar}[X]=\sigma^2<\infty\)”.

The multivariate normal is represented by \(\mathcal{N}_p(\boldsymbol{\mu},\boldsymbol{\Sigma}),\) where \(\boldsymbol{\mu}\) is a \(p\)-vector and \(\boldsymbol{\Sigma}\) is a \(p\times p\) symmetric and positive matrix. The pdf of a \(\mathcal{N}(\boldsymbol{\mu},\boldsymbol{\Sigma})\) is \(\phi_{\boldsymbol{\Sigma}}(\mathbf{x}-\boldsymbol{\mu}):=\frac{1}{(2\pi)^{p/2}|\boldsymbol{\Sigma}|^{1/2}}e^{-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})'\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})}\) and satisfies that \(\phi_{\boldsymbol{\Sigma}}(\mathbf{x}-\boldsymbol{\mu})=|\boldsymbol{\Sigma}|^{-1/2}\phi\left(\boldsymbol{\Sigma}^{-1/2}(\mathbf{x}-\boldsymbol{\mu})\right)\) (if \(\boldsymbol{\Sigma}=\mathbf{I},\) the dependence on \(\boldsymbol{\Sigma}\) is omitted). The multivariate normal has an appealing linear property that stems from (1.2) and (1.3):

\[\begin{align} \mathbf{A}\mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma)+\mathbf{b}\stackrel{d}{=}\mathcal{N}_q(\mathbf{A}\boldsymbol\mu+\mathbf{b},\mathbf{A}\boldsymbol\Sigma\mathbf{A}').\tag{1.5} \end{align}\]

Exercise 1.9 The pdf of a bivariate normal (\(p=2,\) see Figure 1.1) can be also expressed as

\[\begin{align} &\phi(x_1,x_2;\mu_1,\mu_2,\sigma_1^2,\sigma_2^2,\rho):=\frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}\tag{1.6}\\ &\;\times\exp\left\{-\frac{1}{2(1-\rho^2)}\left[\frac{(x_1-\mu_1)^2}{\sigma_1^2}+\frac{(x_2-\mu_2)^2}{\sigma_2^2}-\frac{2\rho(x_1-\mu_1)(x_2-\mu_2)}{\sigma_1\sigma_2}\right]\right\},\nonumber \end{align}\]

where \(\mu_1,\mu_2\in\mathbb{R},\) \(\sigma_1,\sigma_2>0,\) and \(-1<\rho<1.\) The parametrization uses \(\boldsymbol{\mu}=(\mu_1,\mu_2)'\) and \(\boldsymbol{\Sigma}=(\sigma_1^2,\rho\sigma_1\sigma_2;\rho\sigma_1\sigma_2,\sigma_2^2).\)6

  1. Derive the pdf of \(X_1\): \(\phi(x_1;\mu_1,\sigma_1^2).\)
  2. Derive the pdf of \(X_1|X_2=x_2\): \(\phi\left(x_1;\mu_1+\rho\frac{\sigma_1}{\sigma_2}(x_2-\mu_2),(1-\rho^2)\sigma_1^2\right).\)
  3. Derive \(\mathbb{E}[X_1|X_2=x_2]\) and \(\mathbb{V}\mathrm{ar}[X_1|X_2=x_2].\)

1.2.2 Other distributions

  • The lognormal distribution is denoted by \(\mathcal{LN}(\mu,\sigma^2)\) and is such that \(\mathcal{LN}(\mu,\sigma^2)\stackrel{d}{=}\exp(\mathcal{N}(\mu,\sigma^2)).\) Its pdf is \(f(x;\mu,\sigma)=\frac{1}{x}\phi_\sigma(\log x-\log\mu)=\frac{1}{\sqrt{2\pi}\sigma x}e^{-\frac{(\log x-\log\mu)^2}{2\sigma^2}},\) \(x>0.\) Note that \(\mathbb{E}[\mathcal{LN}(\mu,\sigma^2)]=e^{\mu+\frac{\sigma^2}{2}}\) and \(\mathbb{V}\mathrm{ar}[\mathcal{LN}(\mu,\sigma^2)]=\left(e^{\sigma^2}-1\right)e^{2\mu+\sigma^2}.\)

  • The exponential distribution is denoted by \(\mathrm{Exp}(\lambda)\) and has pdf \(f(x;\lambda)=\lambda e^{-\lambda x},\) \(\lambda,x>0.\)

  • The gamma distribution is denoted by \(\Gamma(a,p)\) and has pdf \(f(x;a,p)=\frac{a^p}{\Gamma(p)} x^{p-1}e^{-a x},\) \(a,p,x>0,\) where \(\Gamma(p)=\int_0^\infty x^{p-1}e^{-ax}\,\mathrm{d}x.\) The parameter \(a\) is the rate and \(p\) is the shape. It is known that \(\mathbb{E}[\Gamma(a,p)]=\frac{p}{a}\) and \(\mathbb{V}\mathrm{ar}[\Gamma(a,p)]=\frac{p}{a^2}.\)

  • The inverse gamma distribution, \(\mathrm{IG}(a,p)\stackrel{d}{=}\Gamma(a,p)^{-1},\) has pdf \(f(x;a,p)=\frac{a^p}{\Gamma(p)} x^{-p-1}e^{-\frac{a}{x}},\) \(a,p,x>0.\) It is known that \(\mathbb{E}[\mathrm{IG}(a,p)]=\frac{a}{p-1}\) and \(\mathbb{V}\mathrm{ar}[\mathrm{IG}(a,p)]=\frac{a^2}{(p-1)^2(p-2)}.\)

  • The binomial distribution is denoted by \(\mathrm{B}(n,p).\) Recall that \(\mathbb{E}[\mathrm{B}(n,p)]=np\) and \(\mathbb{V}\mathrm{ar}[\mathrm{B}(n,p)]=np(1-p).\) A \(\mathrm{B}(1,p)\) is a Bernoulli distribution, denoted by \(\mathrm{Ber}(p).\)

  • The beta distribution is denoted by \(\beta(a,b)\) and its pdf is \(f(x;a,b)=\frac{1}{\beta(a,b)}x^{a-1}(1-x)^{1-b},\) \(0<x<1,\) where \(\beta(a,b)=\frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}.\) When \(a=b=1,\) the uniform distribution \(\mathcal{U}(0,1)\) arises.

  • The Poisson distribution is denoted by \(\mathrm{Pois}(\lambda)\) and has probability mass function \(\mathbb{P}[X=x]=\frac{x^\lambda e^{-\lambda}}{x!},\) \(x=0,1,2,\ldots\) Recall that \(\mathbb{E}[\mathrm{Pois}(\lambda)]=\mathbb{V}\mathrm{ar}[\mathrm{Pois}(\lambda)]=\lambda.\)


  1. A particular useful value for computing confidence intervals is \(z_{0.05/2}=z_{0.025}\approx 1.96\approx 2.\)↩︎

  2. Note that this is an immediate parametrization of a \(2\times2\) covariance matrix. The parametrization becomes cumbersome when \(p>2.\)↩︎