We will make use of certain parametric distributions. Some notation and facts are introduced as follows.

### 1.2.1 Normal distribution

The normal distribution with mean $$\mu$$ and variance $$\sigma^2$$ is denoted by $$\mathcal{N}(\mu,\sigma^2)$$. Its pdf is $$\phi_\sigma(x-\mu):=\frac{1}{\sqrt{2\pi}\sigma}e^{-\frac{(x-\mu)^2}{2\sigma^2}}$$, $$x\in\mathbb{R}$$, and satisfies that $$\phi_\sigma(x-\mu)=\frac{1}{\sigma}\phi\left(\frac{x-\mu}{\sigma}\right)$$ (if $$\sigma=1$$ the dependence on $$\sigma$$ is omitted). Its cdf is denoted by $$\Phi_\sigma(x-\mu)$$. The upper $$\alpha$$-quantile of a $$\mathcal{N}(0,1)$$ is denoted by $$z_\alpha$$, so it satisfies that $$z_\alpha=\Phi^{-1}(1-\alpha)$$.5 The shortest interval that contains $$1-\alpha$$ probability of a $$X\sim\mathcal{N}(\mu,\sigma^2)$$ is $$(\mu-z_{\alpha/2}\sigma,\mu+z_{\alpha/2}\sigma)$$, i.e., $$\mathbb{P}[X\in(\mu\pm z_{\alpha/2}\sigma)]=1-\alpha$$. Some uncentered moments of $$X\sim\mathcal{N}(\mu,\sigma^2)$$ are

\begin{align*} \mathbb{E}[X]&=\mu,\\ \mathbb{E}[X^2]&=\mu^2+\sigma^2,\\ \mathbb{E}[X^3]&=\mu^3+3\mu\sigma^2,\\ \mathbb{E}[X^4]&=\mu^4+6\mu^2\sigma^2+3\sigma^4. \end{align*}

Remark. It is interesting to compare the length of $$(\mu\pm z_{\alpha/2}\sigma)$$ for $$\alpha=1/t^2$$ with the one in (1.4), as this gives direct insight into how larger the Chebyshev confidence interval (1.4) is when $$X\sim\mathcal{N}(\mu,\sigma^2)$$. The table below gives the length increment factor $$t/z_{(0.5/t^2)}$$ of the Chebyshev confidence interval.

$$t$$ $$2$$ $$3$$ $$4$$ $$5$$ $$6$$
Guaranteed coverage $$0.75$$ $$0.8889$$ $$0.9375$$ $$0.96$$ $$0.9722$$
Increment factor $$1.7386$$ $$1.883$$ $$2.1474$$ $$2.4346$$ $$2.7268$$

Balancing between the guaranteed coverage and increment factor, it seems reasonable to define the “$$3\sigma$$-rule” for any random variable as: “almost $$90\%$$ of the values of a random variable $$X$$ lie on $$(\mu-3\sigma,\mu+3\sigma)$$, if $$\mathbb{E}[X]=\mu$$ and $$\mathbb{V}\mathrm{ar}[X]=\sigma^2<\infty$$.”

The multivariate normal is represented by $$\mathcal{N}_p(\boldsymbol{\mu},\boldsymbol{\Sigma})$$, where $$\boldsymbol{\mu}$$ is a $$p$$-vector and $$\boldsymbol{\Sigma}$$ is a $$p\times p$$ symmetric and positive matrix. The pdf of a $$\mathcal{N}(\boldsymbol{\mu},\boldsymbol{\Sigma})$$ is $$\phi_{\boldsymbol{\Sigma}}(\mathbf{x}-\boldsymbol{\mu}):=\frac{1}{(2\pi)^{p/2}|\boldsymbol{\Sigma}|^{1/2}}e^{-\frac{1}{2}(\mathbf{x}-\boldsymbol{\mu})'\boldsymbol{\Sigma}^{-1}(\mathbf{x}-\boldsymbol{\mu})}$$ and satisfies that $$\phi_{\boldsymbol{\Sigma}}(\mathbf{x}-\boldsymbol{\mu})=|\boldsymbol{\Sigma}|^{-1/2}\phi\left(\boldsymbol{\Sigma}^{-1/2}(\mathbf{x}-\boldsymbol{\mu})\right)$$ (if $$\boldsymbol{\Sigma}=\mathbf{I}$$, the dependence on $$\boldsymbol{\Sigma}$$ is omitted). The multivariate normal has an appealing linear property that stems from (1.2) and (1.3):

\begin{align} \mathbf{A}\mathcal{N}_p(\boldsymbol\mu,\boldsymbol\Sigma)+\mathbf{b}\stackrel{d}{=}\mathcal{N}_q(\mathbf{A}\boldsymbol\mu+\mathbf{b},\mathbf{A}\boldsymbol\Sigma\mathbf{A}').\tag{1.5} \end{align}

Exercise 1.9 The pdf of a bivariate normal ($$p=2$$, see Figure 1.1) can be also expressed as

\begin{align} &\phi(x_1,x_2;\mu_1,\mu_2,\sigma_1^2,\sigma_2^2,\rho):=\frac{1}{2\pi\sigma_1\sigma_2\sqrt{1-\rho^2}}\tag{1.6}\\ &\;\times\exp\left\{-\frac{1}{2(1-\rho^2)}\left[\frac{(x_1-\mu_1)^2}{\sigma_1^2}+\frac{(x_2-\mu_2)^2}{\sigma_2^2}-\frac{2\rho(x_1-\mu_1)(x_2-\mu_2)}{\sigma_1\sigma_2}\right]\right\},\nonumber \end{align}

where $$\mu_1,\mu_2\in\mathbb{R}$$, $$\sigma_1,\sigma_2>0$$, and $$-1<\rho<1$$. The parametrization uses $$\boldsymbol{\mu}=(\mu_1,\mu_2)'$$ and $$\boldsymbol{\Sigma}=(\sigma_1^2,\rho\sigma_1\sigma_2;\rho\sigma_1\sigma_2,\sigma_2^2)$$.6

1. Derive the pdf of $$X_1$$: $$\phi(x_1;\mu_1,\sigma_1^2)$$.
2. Derive the pdf of $$X_1|X_2=x_2$$: $$\phi\left(x_1;\mu_1+\rho\frac{\sigma_1}{\sigma_2}(x_2-\mu_2),(1-\rho^2)\sigma_1^2\right)$$.
3. Derive $$\mathbb{E}[X_1|X_2=x_2]$$ and $$\mathbb{V}\mathrm{ar}[X_1|X_2=x_2]$$.

### 1.2.2 Other distributions

• The lognormal distribution is denoted by $$\mathcal{LN}(\mu,\sigma^2)$$ and is such that $$\mathcal{LN}(\mu,\sigma^2)\stackrel{d}{=}\exp(\mathcal{N}(\mu,\sigma^2))$$. Its pdf is $$f(x;\mu,\sigma)=\frac{1}{x}\phi_\sigma(\log x-\log\mu)=\frac{1}{\sqrt{2\pi}\sigma x}e^{-\frac{(\log x-\log\mu)^2}{2\sigma^2}}$$, $$x>0$$. Note that $$\mathbb{E}[\mathcal{LN}(\mu,\sigma^2)]=e^{\mu+\frac{\sigma^2}{2}}$$ and $$\mathbb{V}\mathrm{ar}[\mathcal{LN}(\mu,\sigma^2)]=\left(e^{\sigma^2}-1\right)e^{2\mu+\sigma^2}$$.

• The exponential distribution is denoted by $$\mathrm{Exp}(\lambda)$$ and has pdf $$f(x;\lambda)=\lambda e^{-\lambda x}$$, $$\lambda,x>0$$.

• The gamma distribution is denoted by $$\Gamma(a,p)$$ and has pdf $$f(x;a,p)=\frac{a^p}{\Gamma(p)} x^{p-1}e^{-a x}$$, $$a,p,x>0$$, where $$\Gamma(p)=\int_0^\infty x^{p-1}e^{-ax}\,\mathrm{d}x$$. It is known that $$\mathbb{E}[\Gamma(a,p)]=\frac{p}{a}$$ and $$\mathbb{V}\mathrm{ar}[\Gamma(a,p)]=\frac{p}{a^2}$$.

• The inverse gamma distribution, $$\mathrm{IG}(a,p)\stackrel{d}{=}\Gamma(a,p)^{-1}$$, has pdf $$f(x;a,p)=\frac{a^p}{\Gamma(p)} x^{-p-1}e^{-\frac{a}{x}}$$, $$a,p,x>0$$. It is known that $$\mathbb{E}[\mathrm{IG}(a,p)]=\frac{a}{p-1}$$ and $$\mathbb{V}\mathrm{ar}[\mathrm{IG}(a,p)]=\frac{a^2}{(p-1)^2(p-2)}$$.

• The binomial distribution is denoted by $$\mathrm{B}(n,p)$$. Recall that $$\mathbb{E}[\mathrm{B}(n,p)]=np$$ and $$\mathbb{V}\mathrm{ar}[\mathrm{B}(n,p)]=np(1-p)$$. A $$\mathrm{B}(1,p)$$ is a Bernoulli distribution, denoted by $$\mathrm{Ber}(p)$$.

• The beta distribution is denoted by $$\beta(a,b)$$ and its pdf is $$f(x;a,b)=\frac{1}{\beta(a,b)}x^{a-1}(1-x)^{1-b}$$, $$0<x<1$$, where $$\beta(a,b)=\frac{\Gamma(a)\Gamma(b)}{\Gamma(a+b)}$$. When $$a=b=1$$, the uniform distribution $$\mathcal{U}(0,1)$$ arises.

• The Poisson distribution is denoted by $$\mathrm{Pois}(\lambda)$$ and has probability mass function $$\mathbb{P}[X=x]=\frac{x^\lambda e^{-\lambda}}{x!}$$, $$x=0,1,2,\ldots$$ Recall that $$\mathbb{E}[\mathrm{Pois}(\lambda)]=\mathbb{V}\mathrm{ar}[\mathrm{Pois}(\lambda)]=\lambda$$.

1. A particular useful value for computing confidence intervals is $$z_{0.05/2}=z_{0.025}\approx 1.96\approx 2$$.↩︎

2. Note that this is an inmediate parametrization of a $$2\times2$$ covariance matrix. The parametrization becomes cumbersome when $$p>2$$.↩︎