Chapter 6 Exponential Dispersion Family
Our models of functional relationships are defined by a constraint on the mean of the distribution, Equation (5.4). In order for them to be useful in practice, this constraint must be effective, which in practice means that it must be easy to relate the parameters of the probability distribution to its expectation.
In this section, we examine a very general class of probability distributions for which this is the case.
6.1 The Exponential Dispersion Family
6.1.1 Definition
An exponential dispersion family (EDF) of probability distributions has the following form: \[\begin{equation} P_{}\left(y |\theta, \phi\right) = \exp \Big[ \frac{y\theta - b(\theta)}{\phi} + c(y, \phi) \Big] \tag{6.1} \end{equation}\] where
- \(\theta \in {\mathbb R}\) is the natural (or canonical) parameter,
- \(\phi > 0\) is the dispersion parameter; in many settings this is not of direct interest, and thus may be referred to as a “nuisance” parameter.
- \(b: {\mathbb R}\to{\mathbb R}\) and \(c:{\mathbb R}\times{\mathbb R}_{> 0}\to{\mathbb R}\) are both functions. The function \(b\) is known as the “log normaliser” for reasons that will become clear.
6.1.2 Examples
6.1.2.1 Poisson
The Poisson distribution for \(y\in\mathbb{N}\) is \[\begin{eqnarray} P_{}\left(y |\lambda\right) & = & \frac{\lambda^y e^{-\lambda}}{y!} \tag{6.2} \\ & = & \exp[y \log \lambda - \lambda - \log y!] \tag{6.3} \end{eqnarray}\] It is thus an EDF with
- \(\theta = \log\lambda\)
- \(\phi = 1\);
- \(b(\theta) = \lambda = e^{\theta}\);
- \(c(y, \phi) = -\log y!\)
6.1.2.2 Bernoulli
The Bernoulli distribution has \(y\in\left\{0, 1\right\}\). Its distribution is \[\begin{eqnarray} P_{}\left(y|\pi\right) & = & \pi^{y}\, (1 - \pi)^{1 - y} \tag{6.4} \\ & = & \exp \Big[ y \log \frac{\pi}{1 - \pi} + \log(1 - \pi) \Big] \tag{6.5} \end{eqnarray}\]
It is thus an EDF with
- \(\theta = \log \frac{\pi}{1 - \pi}\)
- \(\phi = 1\);
- \(b(\theta) = -\log(1 - \pi) = \log(1 + e^{\theta})\);
- \(c(y, \phi) = 0\).
6.1.2.3 Gaussian
The normal distribution has \(y\in{\mathbb R}\). Its density is \[\begin{eqnarray} P_{}\left(y|\mu, \sigma^{2}\right) & = & \frac{1}{\sqrt{2\pi\sigma^{2}}} \exp \Big[ -\frac{1}{2\sigma^{2}}(y - \mu)^{2} \Big] \tag{6.6} \\ & = & \exp \Big[ \frac{y\mu - \frac{1}{2}\mu^{2}}{\sigma^{2}} - \frac{y^{2}}{2\sigma^{2}} - \frac{1}{2}\log(2\pi\sigma^{2}) \Big] \tag{6.7} \end{eqnarray}\]
It is thus an EDF with
- \(\theta = \mu\),
- \(\phi = \sigma^{2}\);
- \(b(\theta) = \frac{1}{2}\mu^{2} = \frac{1}{2}\theta^{2}\);
- \(c(y, \phi) = - \frac{y^{2}}{2\sigma^{2}} - \frac{1}{2}\log(2\pi\sigma^{2}) = -\frac{y^{2}}{ 2\phi} - \frac{1}{2}\log(2\pi\phi)\).
6.2 Properties of EDFs
In order that a probability distribution be normalised, we must have \[\begin{equation} \int P_{}\left(y |\theta, \phi\right) dy = 1 \tag{6.8} \end{equation}\]
Making use of Equation (6.8) in the context of an EDF distribution (Equation (6.1)), we have \[\begin{eqnarray} && \int \exp \Big[ \frac{y\theta - b(\theta)}{\phi} + c(y, \phi) \Big] dy = 1 \\ & \implies & \exp \left( -\frac{b(\theta)}{\phi} \right) \int \exp \left( \frac{y\theta}{\phi} + c(y, \phi) \right) \;dy = 1 \\ & \implies & \frac{b(\theta)}{\phi} = \log \int \exp \big( \frac{y\theta}{\phi} + c(y, \phi) \big) \;dy \tag{6.9} \end{eqnarray}\]
Equation (6.9) determines \(b\) in terms of \(\phi\) and the function \(c\). It is therefore not an independently variable quantity. The function \(b\) will be known as the “log normaliser”, although note that the factor of \(\phi\) is necessary too.
6.2.1 Mean
If we differentiate Equation (6.9) with respect to \(\theta\) (using the chain rule68 and Leibniz rule69), we find \[\begin{eqnarray} \frac{b'(\theta)}{\phi} & = & \frac{\int \frac{y}{\phi} \exp \left( \frac{y \theta}{\phi} + c(y, \phi) \right) dy }{\int \exp \left( \frac{y \theta}{\phi} + c(y, \phi) \right) dy} \tag{6.10} \end{eqnarray}\] where Equation (6.9) can now be used to substitute the denominator on the right hand side of Equation (6.10) to give \[\begin{eqnarray} \frac{b'(\theta)}{\phi} & = & \int \frac{y}{\phi} \frac{ \exp \left( \frac{y \theta}{\phi} + c(y, \phi) \right) }{\exp \left( \frac{b(\theta)}{\phi} \right)} dy \nonumber \\ & = & \int \frac{y}{\phi} \exp \left( \frac{y \theta - b(\theta)}{\phi} + c(y, \phi) \right) dy \tag{6.11} \\ & = & \int \frac{y}{\phi} P_{}\left(y |\theta, \phi\right) dy \nonumber \\ & = & \frac{1}{\phi} {\mathrm E}[Y |\theta, \phi] \end{eqnarray}\] so that \[\begin{equation} b'(\theta) = {\mathrm E}[Y |\theta, \phi] \end{equation}\]
Now, it turns out that \(b'\) is almost always invertible for finite parameter values, because \(b'' > 0\) except when the variance of the distribution is zero (see Section 6.2.2). Thus, \[\begin{equation} \mu \triangleq {\mathrm E}[Y |\theta, \phi] = b'(\theta) \tag{6.12} \end{equation}\] means that \[\begin{equation} \theta = (b')^{-1}(\mu) \tag{6.13} \end{equation}\]
Notice that the EDF distribution can therefore be parameterised in terms of \(\theta\) or in terms of \(\mu\). Equations (6.12) and (6.13) are the crucial equations from the point of view of models of functional relationships (and GLMs in particular), as they relate the parameterisation of the distribution to its expectation in a bijective way.
6.2.2 Variance
From Equation (6.11), we have that \[\begin{eqnarray} b'(\theta) & = & \exp \left( - \frac{b(\theta)}{\phi} \right) \int y \exp \left( \frac{y \theta}{\phi} + c(y, \phi) \right) dy \end{eqnarray}\]
We can differentiate again (using the product rule) to obtain: \[\begin{eqnarray} b''(\theta) & = & - \frac{b'(\theta)}{\phi} b'(\theta) + \exp \left( - \frac{b(\theta)}{\phi} \right) \int \frac{y^2}{\phi} \exp \left( \frac{y \theta}{\phi} + c(y, \phi) \right) dy \nonumber \\ & = & - \frac{\mu^2}{\phi} + \frac{1}{\phi} {\mathrm E}[Y^2 |\theta, \phi] \nonumber \\ & = & \frac{1}{\phi} {\mathrm{Var}}[Y |\theta, \phi] \tag{6.14} \end{eqnarray}\] Note that Equation (6.14) shows that \(b'' \geq 0\), with equality only if the variance is zero or the dispersion is infinite.
We can now reparameterise in terms of \(\mu\) \[\begin{eqnarray} {\mathrm{Var}}[Y |\theta, \phi] & = & \phi \, b''(\theta) \nonumber \\ & = & \phi \, b''((b')^{-1}(\mu)) \nonumber \\ & = & \phi \, \mathcal{V}(\mu) \tag{6.15} \end{eqnarray}\]
The function \(\mathcal{V}(\cdot) = b''((b')^{-1}(\cdot))\) is called the variance function. Equations (6.12) and (6.15) make it clear why \(\phi\) is called the “dispersion”. It’s value does not affect \(\mu = {\mathrm E}[Y |\theta, \phi]\), but it scales \({\mathrm{Var}}[Y |\theta, \phi]\).
6.2.3 Examples
We now look at the results of the preceding section as applied to our three examples.
6.2.3.1 Poisson
We have
- \(\theta = \log\lambda\),
- \(\phi = 1\),
- \(b(\theta) = e^{\theta}\).
Thus, \[\begin{alignat}{3} \mu & = b'(\theta) & = e^{\theta} \\ \mathcal{V}(\mu) & = b''(\theta) & = e^{\theta} \end{alignat}\]
meaning that \[\begin{eqnarray} {\mathrm E}[Y |\theta, \phi] & = & e^{\log\lambda} = \lambda \\ {\mathrm{Var}}[Y |\theta, \phi] & = & \phi\;e^{\log\lambda} = \lambda \end{eqnarray}\] as expected.
6.2.3.2 Bernoulli
We have
- \(\theta = \log(\pi/(1 - \pi))\),
- \(\phi = 1\),
- \(b(\theta) = \log(1 + e^{\theta})\).
Thus, \[\begin{alignat}{3} \mu & = b'(\theta) & = \frac{e^{\theta}}{1 + e^{\theta}} \\ \mathcal{V}(\mu) & = b''(\theta) & = \frac{e^{\theta}}{(1 + e^{\theta})^{2}} \end{alignat}\] meaning that \[\begin{eqnarray} {\mathrm E}[Y |\theta, \phi] & = & \frac{\pi}{1 - \pi} \frac{1 - \pi}{1} = \pi \\ {\mathrm{Var}}[Y |\theta, \phi] & = & \phi\;\frac{\pi}{1 - \pi} \frac{(1 - \pi)^{2}}{1} = \pi(1 - \pi) \end{eqnarray}\] as expected.
6.2.3.3 Gaussian
We have
- \(\theta = \mu\)
- \(\phi = \sigma^{2}\),
- \(b(\theta) = \tfrac{1}{2}\theta^{2}\).
Thus \[\begin{alignat}{3} \mu & = b'(\theta) & = \theta \\ \mathcal{V}(\mu) & = b''(\theta) & = 1 \end{alignat}\]
meaning that \[\begin{eqnarray} {\mathrm E}[Y |\theta, \phi] & = & \theta = \mu \\ {\mathrm{Var}}[Y |\theta, \phi] & = & \phi = \sigma^{2} \end{eqnarray}\] as expected.