Chapter 6 Exponential Dispersion Family

Our models of functional relationships are defined by a constraint on the mean of the distribution, Equation (5.4). In order for them to be useful in practice, this constraint must be effective, which in practice means that it must be easy to relate the parameters of the probability distribution to its expectation.

In this section, we examine a very general class of probability distributions for which this is the case.

6.1 The Exponential Dispersion Family

6.1.1 Definition

An exponential dispersion family (EDF) of probability distributions has the following form: \[\begin{equation} P_{}\left(y |\theta, \phi\right) = \exp \Big[ \frac{y\theta - b(\theta)}{\phi} + c(y, \phi) \Big] \tag{6.1} \end{equation}\] where

  • \(\theta \in {\mathbb R}\) is the natural (or canonical) parameter,
  • \(\phi > 0\) is the dispersion parameter; in many settings this is not of direct interest, and thus may be referred to as a “nuisance” parameter.
  • \(b: {\mathbb R}\to{\mathbb R}\) and \(c:{\mathbb R}\times{\mathbb R}_{> 0}\to{\mathbb R}\) are both functions. The function \(b\) is known as the “log normaliser” for reasons that will become clear.

6.1.2 Examples

6.1.2.1 Poisson

The Poisson distribution for \(y\in\mathbb{N}\) is \[\begin{eqnarray} P_{}\left(y |\lambda\right) & = & \frac{\lambda^y e^{-\lambda}}{y!} \tag{6.2} \\ & = & \exp[y \log \lambda - \lambda - \log y!] \tag{6.3} \end{eqnarray}\] It is thus an EDF with

  • \(\theta = \log\lambda\)
  • \(\phi = 1\);
  • \(b(\theta) = \lambda = e^{\theta}\);
  • \(c(y, \phi) = -\log y!\)

6.1.2.2 Bernoulli

The Bernoulli distribution has \(y\in\left\{0, 1\right\}\). Its distribution is \[\begin{eqnarray} P_{}\left(y|\pi\right) & = & \pi^{y}\, (1 - \pi)^{1 - y} \tag{6.4} \\ & = & \exp \Big[ y \log \frac{\pi}{1 - \pi} + \log(1 - \pi) \Big] \tag{6.5} \end{eqnarray}\]

It is thus an EDF with

  • \(\theta = \log \frac{\pi}{1 - \pi}\)
  • \(\phi = 1\);
  • \(b(\theta) = -\log(1 - \pi) = \log(1 + e^{\theta})\);
  • \(c(y, \phi) = 0\).

6.1.2.3 Gaussian

The normal distribution has \(y\in{\mathbb R}\). Its density is \[\begin{eqnarray} P_{}\left(y|\mu, \sigma^{2}\right) & = & \frac{1}{\sqrt{2\pi\sigma^{2}}} \exp \Big[ -\frac{1}{2\sigma^{2}}(y - \mu)^{2} \Big] \tag{6.6} \\ & = & \exp \Big[ \frac{y\mu - \frac{1}{2}\mu^{2}}{\sigma^{2}} - \frac{y^{2}}{2\sigma^{2}} - \frac{1}{2}\log(2\pi\sigma^{2}) \Big] \tag{6.7} \end{eqnarray}\]

It is thus an EDF with

  • \(\theta = \mu\),
  • \(\phi = \sigma^{2}\);
  • \(b(\theta) = \frac{1}{2}\mu^{2} = \frac{1}{2}\theta^{2}\);
  • \(c(y, \phi) = - \frac{y^{2}}{2\sigma^{2}} - \frac{1}{2}\log(2\pi\sigma^{2}) = -\frac{y^{2}}{ 2\phi} - \frac{1}{2}\log(2\pi\phi)\).

6.1.2.4 Further Example Distributions

  • Exponential
  • Gamma
  • Inverse Gamma
  • Binomial
  • Chi-squared
  • Beta

The \(t\)-distribution is not an EDF distribution.

6.2 Properties of EDFs

  • In order that a probability distribution be normalised, we must have \[\begin{equation} \int P_{}\left(y |\theta, \phi\right) dy = 1 \tag{6.8} \end{equation}\]

  • Making use of Equation (6.8) in the context of an EDF distribution (Equation (6.1)), we have \[\begin{eqnarray} && \int \exp \Big[ \frac{y\theta - b(\theta)}{\phi} + c(y, \phi) \Big] dy = 1 \\ & \implies & \exp \left( -\frac{b(\theta)}{\phi} \right) \int \exp \left( \frac{y\theta}{\phi} + c(y, \phi) \right) \;dy = 1 \\ & \implies & \frac{b(\theta)}{\phi} = \log \int \exp \big( \frac{y\theta}{\phi} + c(y, \phi) \big) \;dy \tag{6.9} \end{eqnarray}\]

  • Equation (6.9) determines \(b\) in terms of \(\phi\) and the function \(c\). It is therefore not an independently variable quantity. The function \(b\) will be known as the “log normaliser”, although note that the factor of \(\phi\) is necessary too.

6.2.1 Mean

  • If we differentiate Equation (6.9) with respect to \(\theta\) (using the chain rule68 and Leibniz rule69), we find \[\begin{eqnarray} \frac{b'(\theta)}{\phi} & = & \frac{\int \frac{y}{\phi} \exp \left( \frac{y \theta}{\phi} + c(y, \phi) \right) dy }{\int \exp \left( \frac{y \theta}{\phi} + c(y, \phi) \right) dy} \tag{6.10} \end{eqnarray}\] where Equation (6.9) can now be used to substitute the denominator on the right hand side of Equation (6.10) to give \[\begin{eqnarray} \frac{b'(\theta)}{\phi} & = & \int \frac{y}{\phi} \frac{ \exp \left( \frac{y \theta}{\phi} + c(y, \phi) \right) }{\exp \left( \frac{b(\theta)}{\phi} \right)} dy \nonumber \\ & = & \int \frac{y}{\phi} \exp \left( \frac{y \theta - b(\theta)}{\phi} + c(y, \phi) \right) dy \tag{6.11} \\ & = & \int \frac{y}{\phi} P_{}\left(y |\theta, \phi\right) dy \nonumber \\ & = & \frac{1}{\phi} {\mathrm E}[Y |\theta, \phi] \end{eqnarray}\] so that \[\begin{equation} b'(\theta) = {\mathrm E}[Y |\theta, \phi] \end{equation}\]

  • Now, it turns out that \(b'\) is almost always invertible for finite parameter values, because \(b'' > 0\) except when the variance of the distribution is zero (see Section 6.2.2). Thus, \[\begin{equation} \mu \triangleq {\mathrm E}[Y |\theta, \phi] = b'(\theta) \tag{6.12} \end{equation}\] means that \[\begin{equation} \theta = (b')^{-1}(\mu) \tag{6.13} \end{equation}\]

  • Notice that the EDF distribution can therefore be parameterised in terms of \(\theta\) or in terms of \(\mu\). Equations (6.12) and (6.13) are the crucial equations from the point of view of models of functional relationships (and GLMs in particular), as they relate the parameterisation of the distribution to its expectation in a bijective way.

6.2.2 Variance

  • From Equation (6.11), we have that \[\begin{eqnarray} b'(\theta) & = & \exp \left( - \frac{b(\theta)}{\phi} \right) \int y \exp \left( \frac{y \theta}{\phi} + c(y, \phi) \right) dy \end{eqnarray}\]

  • We can differentiate again (using the product rule) to obtain: \[\begin{eqnarray} b''(\theta) & = & - \frac{b'(\theta)}{\phi} b'(\theta) + \exp \left( - \frac{b(\theta)}{\phi} \right) \int \frac{y^2}{\phi} \exp \left( \frac{y \theta}{\phi} + c(y, \phi) \right) dy \nonumber \\ & = & - \frac{\mu^2}{\phi} + \frac{1}{\phi} {\mathrm E}[Y^2 |\theta, \phi] \nonumber \\ & = & \frac{1}{\phi} {\mathrm{Var}}[Y |\theta, \phi] \tag{6.14} \end{eqnarray}\] Note that Equation (6.14) shows that \(b'' \geq 0\), with equality only if the variance is zero or the dispersion is infinite.

  • We can now reparameterise in terms of \(\mu\) \[\begin{eqnarray} {\mathrm{Var}}[Y |\theta, \phi] & = & \phi \, b''(\theta) \nonumber \\ & = & \phi \, b''((b')^{-1}(\mu)) \nonumber \\ & = & \phi \, \mathcal{V}(\mu) \tag{6.15} \end{eqnarray}\]

  • The function \(\mathcal{V}(\cdot) = b''((b')^{-1}(\cdot))\) is called the variance function. Equations (6.12) and (6.15) make it clear why \(\phi\) is called the “dispersion”. It’s value does not affect \(\mu = {\mathrm E}[Y |\theta, \phi]\), but it scales \({\mathrm{Var}}[Y |\theta, \phi]\).

6.2.3 Examples

We now look at the results of the preceding section as applied to our three examples.

6.2.3.1 Poisson

We have

  • \(\theta = \log\lambda\),
  • \(\phi = 1\),
  • \(b(\theta) = e^{\theta}\).

Thus, \[\begin{alignat}{3} \mu & = b'(\theta) & = e^{\theta} \\ \mathcal{V}(\mu) & = b''(\theta) & = e^{\theta} \end{alignat}\]

meaning that \[\begin{eqnarray} {\mathrm E}[Y |\theta, \phi] & = & e^{\log\lambda} = \lambda \\ {\mathrm{Var}}[Y |\theta, \phi] & = & \phi\;e^{\log\lambda} = \lambda \end{eqnarray}\] as expected.

6.2.3.2 Bernoulli

We have

  • \(\theta = \log(\pi/(1 - \pi))\),
  • \(\phi = 1\),
  • \(b(\theta) = \log(1 + e^{\theta})\).

Thus, \[\begin{alignat}{3} \mu & = b'(\theta) & = \frac{e^{\theta}}{1 + e^{\theta}} \\ \mathcal{V}(\mu) & = b''(\theta) & = \frac{e^{\theta}}{(1 + e^{\theta})^{2}} \end{alignat}\] meaning that \[\begin{eqnarray} {\mathrm E}[Y |\theta, \phi] & = & \frac{\pi}{1 - \pi} \frac{1 - \pi}{1} = \pi \\ {\mathrm{Var}}[Y |\theta, \phi] & = & \phi\;\frac{\pi}{1 - \pi} \frac{(1 - \pi)^{2}}{1} = \pi(1 - \pi) \end{eqnarray}\] as expected.

6.2.3.3 Gaussian

We have

  • \(\theta = \mu\)
  • \(\phi = \sigma^{2}\),
  • \(b(\theta) = \tfrac{1}{2}\theta^{2}\).

Thus \[\begin{alignat}{3} \mu & = b'(\theta) & = \theta \\ \mathcal{V}(\mu) & = b''(\theta) & = 1 \end{alignat}\]

meaning that \[\begin{eqnarray} {\mathrm E}[Y |\theta, \phi] & = & \theta = \mu \\ {\mathrm{Var}}[Y |\theta, \phi] & = & \phi = \sigma^{2} \end{eqnarray}\] as expected.


  1. with \(u = \int \exp \big( \frac{y\theta}{\phi} + c(y, \phi) \big) \;dy\)↩︎

  2. Leibniz rule allows the exchangeability of integral and derivative.↩︎