1.2 Facts about distributions

We will make use of certain parametric distributions. Some notation and facts are introduced as follows.

1.2.1 Normal distribution

The normal distribution with mean μ and variance σ2 is denoted by N(μ,σ2). Its pdf is ϕσ(xμ):=12πσe(xμ)22σ2, xR, and satisfies that ϕσ(xμ)=1σϕ(xμσ) (if σ=1 the dependence on σ is omitted). Its cdf is denoted by Φσ(xμ). The upper α-quantile of a N(0,1) is denoted by zα, so it satisfies that zα=Φ1(1α).6 The shortest interval that contains 1α probability of a XN(μ,σ2) is (μzα/2σ,μ+zα/2σ), i.e., P[X(μ±zα/2σ)]=1α. Some uncentered moments of XN(μ,σ2) are

E[X]=μ,E[X2]=μ2+σ2,E[X3]=μ3+3μσ2,E[X4]=μ4+6μ2σ2+3σ4.

Remark. It is interesting to compare the length of (μ±zα/2σ) for α=1/t2 with the one in (1.4), as this gives direct insight into how larger the Chebyshev confidence interval (1.4) is when XN(μ,σ2). The table below gives the length increment factor t/z(0.5/t2) of the Chebyshev confidence interval.

t 2 3 4 5 6
Guaranteed coverage 0.75 0.8889 0.9375 0.96 0.9722
Increment factor 1.7386 1.883 2.1474 2.4346 2.7268

Balancing between the guaranteed coverage and increment factor, it seems reasonable to define the “3σ-rule” for any random variable as: “almost 90% of the values of a random variable X lie on (μ3σ,μ+3σ), if E[X]=μ and Var[X]=σ2<”.

The multivariate normal is represented by \mathcal{N}_p(\boldsymbol{\mu},\boldsymbol{\Sigma}), where \boldsymbol{\mu} is a p-vector and \boldsymbol{\Sigma} is a p\times p symmetric and positive matrix. The pdf of a \mathcal{N}(\boldsymbol{\mu},\boldsymbol{\Sigma}) is \phi_{\boldsymbol{\Sigma}}(\mathbf{x}-\boldsymbol{\mu}):=\frac{1}{(2\pi)^{p/2}|\boldsymbol{\Sigma}|^{1/2}}e^{-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})'\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})} and satisfies that \phi_{\boldsymbol{\Sigma}}(\mathbf{x}-\boldsymbol{\mu})=|\boldsymbol{\Sigma}|^{-1/2}\phi\big(\boldsymbol{\Sigma}^{-1/2}(\mathbf{x}-\boldsymbol{\mu})\big) (if \boldsymbol{\Sigma}=\mathbf{I}, the dependence on \boldsymbol{\Sigma} is omitted). The multivariate normal has an appealing linear property that stems from (1.2) and (1.3):

\begin{align} \mathbf{A}\mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma)+\mathbf{b}\stackrel{d}{=}\mathcal{N}_q(\mathbf{A}\boldsymbol\mu+\mathbf{b},\mathbf{A}\boldsymbol\Sigma\mathbf{A}').\tag{1.5} \end{align}

Exercise 1.9 The pdf of a bivariate normal (p=2, see Figure 1.1) can be also expressed as

\begin{align} &\phi(x_1,x_2;\mu_1,\mu_2,\sigma_1^2,\sigma_2^2,\rho):=\frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}\tag{1.6}\\ &\,\times\!\exp\left\{\!-\frac{1}{2(1-\rho^2)}\!\left[\frac{(x_1-\mu_1)^2}{\sigma_1^2}+\frac{(x_2-\mu_2)^2}{\sigma_2^2}-\frac{2\rho(x_1-\mu_1)(x_2-\mu_2)}{\sigma_1\sigma_2}\right]\!\right\}\!,\nonumber \end{align}

where \mu_1,\mu_2\in\mathbb{R}, \sigma_1,\sigma_2>0, and -1<\rho<1. The parametrization uses \boldsymbol{\mu}=(\mu_1,\mu_2)' and \boldsymbol{\Sigma}=(\sigma_1^2,\rho\sigma_1\sigma_2;\rho\sigma_1\sigma_2,\sigma_2^2).7

  1. Derive the pdf of X_1: \phi(x_1;\mu_1,\sigma_1^2).
  2. Derive the pdf of X_1|X_2=x_2: \phi\big(x_1;\mu_1+\rho\frac{\sigma_1}{\sigma_2}(x_2-\mu_2),(1-\rho^2)\sigma_1^2\big).
  3. Derive \mathbb{E}[X_1|X_2=x_2] and \mathbb{V}\mathrm{ar}[X_1|X_2=x_2].

1.2.2 Other distributions

  • The lognormal distribution is denoted by \mathcal{LN}(\mu,\sigma^2) and is such that \mathcal{LN}(\mu,\sigma^2)\stackrel{d}{=}\exp(\mathcal{N}(\mu,\sigma^2)). Its pdf is f(x;\mu,\sigma)=\frac{1}{x}\phi_\sigma(\log x-\log\mu)=\frac{1}{\sqrt{2\pi}\sigma x}e^{-\frac{(\log x-\log\mu)^2}{2\sigma^2}}, x>0. Note that \mathbb{E}[\mathcal{LN}(\mu,\sigma^2)]=e^{\mu+\frac{\sigma^2}{2}} and \mathbb{V}\mathrm{ar}[\mathcal{LN}(\mu,\sigma^2)]=\big(e^{\sigma^2}-1\big)e^{2\mu+\sigma^2}.

  • The exponential distribution is denoted by \mathrm{Exp}(\lambda) and has pdf f(x;\lambda)=\lambda e^{-\lambda x}, \lambda,x>0.

  • The gamma distribution is denoted by \Gamma(a,p) and has pdf f(x;a,p)=\frac{a^p}{\Gamma(p)} x^{p-1}e^{-a x}, a,p,x>0, where \Gamma(p)=\int_0^\infty x^{p-1}e^{-ax}\,\mathrm{d}x. The parameter a is the rate and p is the shape. It is known that \mathbb{E}[\Gamma(a,p)]=\frac{p}{a} and \mathbb{V}\mathrm{ar}[\Gamma(a,p)]=\frac{p}{a^2}.

  • The inverse gamma distribution, \mathrm{IG}(a,p)\stackrel{d}{=}\Gamma(a,p)^{-1}, has pdf f(x;a,p)=\frac{a^p}{\Gamma(p)} x^{-p-1}e^{-\frac{a}{x}}, a,p,x>0. It is known that \mathbb{E}[\mathrm{IG}(a,p)]=\frac{a}{p-1} and \mathbb{V}\mathrm{ar}[\mathrm{IG}(a,p)]=\frac{a^2}{(p-1)^2(p-2)}.

  • The binomial distribution is denoted by \mathrm{B}(n,p). Recall that \mathbb{E}[\mathrm{B}(n,p)]=np and \mathbb{V}\mathrm{ar}[\mathrm{B}(n,p)]=np(1-p). A \mathrm{B}(1,p) is a Bernoulli distribution, denoted by \mathrm{Ber}(p).

  • The beta distribution is denoted by \beta(a,b) and its pdf is f(x;a,b)=\frac{1}{\beta(a,b)}x^{a-1}(1-x)^{1-b}, 0<x<1, where \beta(a,b)=\frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}. When a=b=1, the uniform distribution \mathcal{U}(0,1) arises.

  • The Poisson distribution is denoted by \mathrm{Pois}(\lambda) and has probability mass function \mathbb{P}[X=x]=\frac{\lambda^x e^{-\lambda}}{x!}, x=0,1,2,\ldots Recall that \mathbb{E}[\mathrm{Pois}(\lambda)]=\mathbb{V}\mathrm{ar}[\mathrm{Pois}(\lambda)]=\lambda.


  1. A particular useful value for computing confidence intervals is z_{0.05/2}=z_{0.025}\approx 1.96\approx 2.↩︎

  2. Note that this is an immediate parametrization of a 2\times2 covariance matrix. The parametrization becomes cumbersome when p>2.↩︎