Chapter 6 Exponential Dispersion Family

Our models of functional relationships are defined by a constraint on the mean of the distribution, Equation (5.4). In order for them to be useful in practice, this constraint must be effective, which in practice means that it must be easy to relate the parameters of the probability distribution to its expectation.

In this section, we examine a very general class of probability distributions for which this is the case.

6.1 The Exponential Dispersion Family

6.1.1 Definition

An exponential dispersion family (EDF) of probability distributions has the following form: P(y|θ,ϕ)=exp[yθb(θ)ϕ+c(y,ϕ)] where

  • θR is the natural (or canonical) parameter,
  • ϕ>0 is the dispersion parameter; in many settings this is not of direct interest, and thus may be referred to as a “nuisance” parameter.
  • b:RR and c:R×R>0R are both functions. The function b is known as the “log normaliser” for reasons that will become clear.

6.1.2 Examples

6.1.2.1 Poisson

The Poisson distribution for yN is P(y|λ)=λyeλy!=exp[ylogλλlogy!] It is thus an EDF with

  • θ=logλ
  • ϕ=1;
  • b(θ)=λ=eθ;
  • c(y,ϕ)=logy!

6.1.2.2 Bernoulli

The Bernoulli distribution has y{0,1}. Its distribution is P(y|π)=πy(1π)1y=exp[ylogπ1π+log(1π)]

It is thus an EDF with

  • θ=logπ1π
  • ϕ=1;
  • b(θ)=log(1π)=log(1+eθ);
  • c(y,ϕ)=0.

6.1.2.3 Gaussian

The normal distribution has yR. Its density is P(y|μ,σ2)=12πσ2exp[12σ2(yμ)2]=exp[yμ12μ2σ2y22σ212log(2πσ2)]

It is thus an EDF with

  • θ=μ,
  • ϕ=σ2;
  • b(θ)=12μ2=12θ2;
  • c(y,ϕ)=y22σ212log(2πσ2)=y22ϕ12log(2πϕ).

6.1.2.4 Further Example Distributions

  • Exponential
  • Gamma
  • Inverse Gamma
  • Binomial
  • Chi-squared
  • Beta

The t-distribution is not an EDF distribution.

6.2 Properties of EDFs

  • In order that a probability distribution be normalised, we must have P(y|θ,ϕ)dy=1

  • Making use of Equation (6.8) in the context of an EDF distribution (Equation (6.1)), we have exp[yθb(θ)ϕ+c(y,ϕ)]dy=1exp(b(θ)ϕ)exp(yθϕ+c(y,ϕ))dy=1b(θ)ϕ=logexp(yθϕ+c(y,ϕ))dy

  • Equation (6.9) determines b in terms of ϕ and the function c. It is therefore not an independently variable quantity. The function b will be known as the “log normaliser”, although note that the factor of ϕ is necessary too.

6.2.1 Mean

  • If we differentiate Equation (6.9) with respect to θ (using the chain rule68 and Leibniz rule69), we find b(θ)ϕ=yϕexp(yθϕ+c(y,ϕ))dyexp(yθϕ+c(y,ϕ))dy where Equation (6.9) can now be used to substitute the denominator on the right hand side of Equation (6.10) to give b(θ)ϕ=yϕexp(yθϕ+c(y,ϕ))exp(b(θ)ϕ)dy=yϕexp(yθb(θ)ϕ+c(y,ϕ))dy=yϕP(y|θ,ϕ)dy=1ϕE[Y|θ,ϕ] so that b(θ)=E[Y|θ,ϕ]

  • Now, it turns out that b is almost always invertible for finite parameter values, because b except when the variance of the distribution is zero (see Section 6.2.2). Thus, \begin{equation} \mu \triangleq {\mathrm E}[Y |\theta, \phi] = b'(\theta) \tag{6.12} \end{equation} means that \begin{equation} \theta = (b')^{-1}(\mu) \tag{6.13} \end{equation}

  • Notice that the EDF distribution can therefore be parameterised in terms of \theta or in terms of \mu. Equations (6.12) and (6.13) are the crucial equations from the point of view of models of functional relationships (and GLMs in particular), as they relate the parameterisation of the distribution to its expectation in a bijective way.

6.2.2 Variance

  • From Equation (6.11), we have that \begin{eqnarray} b'(\theta) & = & \exp \left( - \frac{b(\theta)}{\phi} \right) \int y \exp \left( \frac{y \theta}{\phi} + c(y, \phi) \right) dy \end{eqnarray}

  • We can differentiate again (using the product rule) to obtain: \begin{eqnarray} b''(\theta) & = & - \frac{b'(\theta)}{\phi} b'(\theta) + \exp \left( - \frac{b(\theta)}{\phi} \right) \int \frac{y^2}{\phi} \exp \left( \frac{y \theta}{\phi} + c(y, \phi) \right) dy \nonumber \\ & = & - \frac{\mu^2}{\phi} + \frac{1}{\phi} {\mathrm E}[Y^2 |\theta, \phi] \nonumber \\ & = & \frac{1}{\phi} {\mathrm{Var}}[Y |\theta, \phi] \tag{6.14} \end{eqnarray} Note that Equation (6.14) shows that b'' \geq 0, with equality only if the variance is zero or the dispersion is infinite.

  • We can now reparameterise in terms of \mu \begin{eqnarray} {\mathrm{Var}}[Y |\theta, \phi] & = & \phi \, b''(\theta) \nonumber \\ & = & \phi \, b''((b')^{-1}(\mu)) \nonumber \\ & = & \phi \, \mathcal{V}(\mu) \tag{6.15} \end{eqnarray}

  • The function \mathcal{V}(\cdot) = b''((b')^{-1}(\cdot)) is called the variance function. Equations (6.12) and (6.15) make it clear why \phi is called the “dispersion”. It’s value does not affect \mu = {\mathrm E}[Y |\theta, \phi], but it scales {\mathrm{Var}}[Y |\theta, \phi].

6.2.3 Examples

We now look at the results of the preceding section as applied to our three examples.

6.2.3.1 Poisson

We have

  • \theta = \log\lambda,
  • \phi = 1,
  • b(\theta) = e^{\theta}.

Thus, \begin{alignat}{3} \mu & = b'(\theta) & = e^{\theta} \\ \mathcal{V}(\mu) & = b''(\theta) & = e^{\theta} \end{alignat}

meaning that \begin{eqnarray} {\mathrm E}[Y |\theta, \phi] & = & e^{\log\lambda} = \lambda \\ {\mathrm{Var}}[Y |\theta, \phi] & = & \phi\;e^{\log\lambda} = \lambda \end{eqnarray} as expected.

6.2.3.2 Bernoulli

We have

  • \theta = \log(\pi/(1 - \pi)),
  • \phi = 1,
  • b(\theta) = \log(1 + e^{\theta}).

Thus, \begin{alignat}{3} \mu & = b'(\theta) & = \frac{e^{\theta}}{1 + e^{\theta}} \\ \mathcal{V}(\mu) & = b''(\theta) & = \frac{e^{\theta}}{(1 + e^{\theta})^{2}} \end{alignat} meaning that \begin{eqnarray} {\mathrm E}[Y |\theta, \phi] & = & \frac{\pi}{1 - \pi} \frac{1 - \pi}{1} = \pi \\ {\mathrm{Var}}[Y |\theta, \phi] & = & \phi\;\frac{\pi}{1 - \pi} \frac{(1 - \pi)^{2}}{1} = \pi(1 - \pi) \end{eqnarray} as expected.

6.2.3.3 Gaussian

We have

  • \theta = \mu
  • \phi = \sigma^{2},
  • b(\theta) = \tfrac{1}{2}\theta^{2}.

Thus \begin{alignat}{3} \mu & = b'(\theta) & = \theta \\ \mathcal{V}(\mu) & = b''(\theta) & = 1 \end{alignat}

meaning that \begin{eqnarray} {\mathrm E}[Y |\theta, \phi] & = & \theta = \mu \\ {\mathrm{Var}}[Y |\theta, \phi] & = & \phi = \sigma^{2} \end{eqnarray} as expected.


  1. with u = \int \exp \big( \frac{y\theta}{\phi} + c(y, \phi) \big) \;dy↩︎

  2. Leibniz rule allows the exchangeability of integral and derivative.↩︎