4.1 EM Algorithm for Exponential Families

Data that are generated from a regular exponential family distribution have a density that takes the form $g(x\mid\theta) = h(x) \exp(\theta^\prime t(x))/a(\theta).$ where $\theta$ is the canonical parameter and $t(x)$ is the vector of sufficient statistics. When thinking about the EM algorithm, the idea scenario is that the complete data density can be written as an exponential family. In that case, for the E-step, if $y$ represents the observed component of the complete data, we can write $\begin{eqnarray*} Q(\theta\mid\theta_0) & = & \mathbb{E}[\log g(x\mid\theta)\mid y, \theta_0]\\ & = & \log h(x)-\theta^\prime \mathbb{E}[t(x)\mid y, \theta_0] - \log a(\theta) \end{eqnarray*}$ (Note: We can ignore the $h(x)$ term because it does not involve the $\theta$ parameter.) In order to maximize this function with respect to $\theta$ , we can take the derivative and set it equal to zero, $Q^\prime(\theta\mid\theta_0) = \mathbb{E}[t(x)\mid y,\theta_0] - \mathbb{E}_\theta[t(x)] = 0.$ Hence, for exponential family distributions, executing the M-step is equivalent to setting $\mathbb{E}[t(x)\mid y,\theta_0] = \mathbb{E}_\theta[t(x)]$ where $\mathbb{E}_\theta[t(x)]$ is the unconditional expectation of the complete data and $\mathbb{E}[t(x)\mid y,\theta_0]$ is the conditional expectation of the missing data, given the observed data.