Chapter 13 Birth and Death Process, MCMC for Discrete Distribution(Lecture on 02/16/2021)

Last class we discussed Poisson process and birth and death process. In Poisson process, \(N(t)\) is the number of events happend upto time \(t\), while in birth and death process, \(N(t)\) is the number of individuals in the system at time \(t\). The definition of birth and death process is given by Definition 12.2.

If we define \(p_j(t)=P(N(t)=j)\), then use the results we discussed at the end of last class, we can show that \[\begin{equation} p_j^{\prime}(t)=-(\lambda_j+\mu_j)p_j(t)+\lambda_{j-1}p_{j-1}(t)+\mu_{j+1}p_{j+1}(t) \tag{13.1} \end{equation}\]

If the population starts with one individual, then \(P(N(0)=1)=1\) or \(p_1(0)=1\) is the boundary conditions. Note that the boundary conditions may change if we know some other information about the process.

The birth and death process have some special cases:

  1. Pure birth process, \(\mu_i=0,\forall i\).

  2. If \(\mu_i=0,\forall i\) and \(\lambda_i=\lambda\) for all \(i\), then it becomes a Poisson process. Only the boundary condition changes for the Poisson process as for the Poisson process \(N(0)=0\).

  3. If \(\lambda_i=i\lambda\) and \(\mu_i=i\mu\) for all \(i\), then it is called a linear birth and death process.

  4. If \(\mu_i=0\) and \(\lambda_i=i\lambda,\forall i\), then it is called a linear birth process.

  5. If \(\mu_i=i\lambda\) and \(\lambda_i=0,\forall i\), then it is called a linear death process.

Solving \(p_j(t)\) from (13.1) with its general form of \(\lambda_i,\mu_i\) is difficult. However, closed form expression exists if we concentrate on either the linear birth and deanth process or any of its variants (for example, \(\lambda_i=i\lambda+a,\forall i\)).

Last class we have discussed how to solve \(p_j(t)\) from the differential equations. However, sometimes interest lies in \(M(t)=E[N(t)]\). We will see techniques to solve \(M(t)\) from different variants of birth and death processes.

The simplest case of the birth and death processes is the Poisson process. Recall that the differential equations from the Poisson process was \[\begin{equation} p_j^{\prime}(t)=\lambda p_{j-1}(t)-\lambda p_j(t),\quad \forall j\geq 1,p_0(0)=1 \tag{13.2} \end{equation}\]

We then have \[\begin{equation} M(t)=E(N(t))=\sum_{j=1}^{\infty}jp_j(t) \tag{13.3} \end{equation}\]

By taking derivatives with respect to \(t\), and using the (13.2), we get \[\begin{equation} \begin{split} M^{\prime}(t)&=\sum_{j=1}jp_j^{\prime}(t)\\ &=\sum_{j=1}^{\infty}j[\lambda p_{j-1}(t)-\lambda p_j(t)]\\ &=-\lambda\sum_{j=1}^{\infty}jp_j(t)+\lambda\sum_{j=1}^{\infty}jp_{j-1}(t)\\ &=-\lambda M(t)+\lambda\sum_{j=0}^{\infty}(j+1)p_j(t)\\ &=-\lambda M(t)+\lambda\sum_{j=0}^{\infty}jp_j(t)+\lambda\sum_{j=0}^{\infty}p_j(t)\\ &=-\lambda M(t)+\lambda M(t)+\lambda \quad (\text{as}\,\sum_{j=0}^{\infty}p_j(t)=1)\\ &=\lambda \end{split} \tag{13.4} \end{equation}\]

Therefore, we have \(\int_0^tM^{\prime}(s)ds=\int_0^t\lambda ds\), which implies \(M(t)-M(0)=\lambda t\) or \[\begin{equation} M(t)=M(0)+\lambda t \tag{13.5} \end{equation}\] Note that \(M(0)=E(N(0))=0\) for a Poisson process. Thus, \[\begin{equation} \boxed{M(t)=\lambda t} \tag{13.6} \end{equation}\]

This is not a surprising result, given Theorem 12.1, where we showed \(N(t)\) follows a Poisson distribution with parameter \(\lambda t\). However, here we get the same result using a different approach.

Now let us do the same for a linear birth process. For a linear birth process, \(\lambda_j=j\lambda,\forall j\) and \(\mu_j=0\). Therefore, the differential equations from the birth and death process boil down to \[\begin{equation} p_j^{\prime}(t)=-\lambda jp_j(t)+\lambda(j-1)p_{j-1}(t),\quad j\geq 1 \tag{13.7} \end{equation}\] By definition, \[\begin{equation} M(t)=E(N(t))=\sum_{j=1}^{\infty}jp_j(t) \tag{13.8} \end{equation}\] taking derivatives with respect to \(t\) and use the result of (13.7), we have \[\begin{equation} \begin{split} M^{\prime}(t)&=\sum_{j=1}^{\infty}jp^{\prime}(t)\\ &=\sum_{j=1}^{\infty}j[-\lambda jp_j(t)+\lambda(j-1)p_{j-1}(t)]\\ &=-\lambda\sum_{j=1}^{\infty}j^2p_j(t)+\lambda\sum_{j=1}^{\infty}j(j-1)p_{j-1}(t) \end{split} \tag{13.9} \end{equation}\] use the formula \(j(j-1)=(j-1)^2+(j-1)\), we can further simplify (13.9) as \[\begin{equation} \begin{split} M^{\prime}(t)&=-\lambda\sum_{j=1}^{\infty}j^2p_j(t)+\lambda\sum_{j=1}^{\infty}[(j-1)^2+(j-1)]p_{j-1}(t)\\ &=-\lambda\sum_{j=1}^{\infty}j^2p_j(t)+\lambda\sum_{j=1}^{\infty}(j-1)^2p_{j-1}(t)+\lambda\sum_{j=1}^{\infty}(j-1)p_{j-1}(t)\\ &=\lambda\sum_{j=1}^{\infty}(j-1)p_{j-1}(t)=\lambda M(t) \end{split} \tag{13.10} \end{equation}\] We have the differential equations as \[\begin{equation} M^{\prime}(t)=\lambda M(t) \tag{13.11} \end{equation}\] or \(M^{\prime}(t)-\lambda M(t)=0\). The soultion to this is still comes from integrating factors. We have \[\begin{equation} e^{-\lambda t}M^{\prime}(t)-e^{-\lambda t}\lambda M(t)=0 \tag{13.12} \end{equation}\] which gives us \[\begin{equation} \frac{d}{dt}[e^{-\lambda t}M(t)]=0 \tag{13.13} \end{equation}\] Integrate on both side we have \[\begin{equation} \int_0^t\frac{d}{dt}[e^{-\lambda t}M(t)]=e^{-\lambda t}M(t)-M(0)=0 \tag{13.14} \end{equation}\] Thus, \[\begin{equation} \boxed{M(t)=M(0)e^{\lambda t}} \tag{13.15} \end{equation}\] If \(M(0)=1\), that is, the process starts with one individual, then \(M(t)=e^{\lambda t}\). In general, if the \(M(0)=n_0\), \(M(t)=n_0e^{\lambda t}\).

Now consider the linear birth and death process, since \(\lambda_j=j\lambda\) and \(\mu_j=j\mu\), the differential equation for \(p_j(t)\) is \[\begin{equation} p_j^{\prime}(t)=-j(\lambda+\mu)p_j(t)+(\lambda-1)\lambda p_{j-1}(t)+(j+1)\mu p_{j+1}(t) \tag{13.16} \end{equation}\] Thus, using this equation we have \[\begin{equation} \begin{split} M^{\prime}(t)&=\sum_{j=1}^{\infty}jp_j^{\prime}(t)\\ &=\sum_{j=1}^{\infty}j[-j(\lambda+\mu)p_j(t)+(j-1)\lambda p_{j-1}(t)+(j+1)\mu p_{j+1}(t)]\\ &=-(\lambda+\mu)\sum_{j=1}^{\infty}j^2p_j(t)+\lambda\sum_{j=1}^{\infty}j(j-1)p_{j-1}(t)+\mu\sum_{j=1}^{\infty}j(j+1)p_{j+1}(t)\\ \end{split} \tag{13.17} \end{equation}\] Using the two facts, \(\left\{\begin{aligned} j(j-1)=(j-1)^2+(j-1) \\ j(j+1)=(j+1)^2-(j+1) \end{aligned}\right.\) we further have \[\begin{equation} \begin{split} M^{\prime}(t)&=-(\lambda+\mu)\sum_{j=1}^{\infty}j^2p_j(t)+\lambda\sum_{j=1}^{\infty}[(j-1)^2+(j-1)]p_{j-1}(t)\\ &+\mu\sum_{j=1}^{\infty}[(j+1)^2-(j+1)]p_{j+1}(t)\\ &=-(\lambda+\mu)\sum_{j=1}^{\infty}j^2p_j(t)+\lambda\sum_{j=1}^{\infty}(j-1)^2p_{j-1}(t)+\lambda\sum_{j=1}^{\infty}(j-1)p_{j-1}(t)\\ &+\mu\sum_{j=1}^{\infty}(j+1)^2p_{j+1}(t)-\mu\sum_{j=1}^{\infty}(j+1)p_{j+1}(t)\\ &=-\lambda\sum_{j=1}^{\infty}j^2p_j(t)-\mu\sum_{j=1}^{\infty}j^2p_j(t)+\lambda\sum_{j=1}^{\infty}(j-1)^2p_{j-1}(t)\\ &+\lambda\sum_{j=1}^{\infty}(j-1)p_{j-1}(t)+\mu[\sum_{j=1}^{\infty}(j+1)^2p_{j+1}(t)+1\cdot p_1(t)]\\ &-\mu[\sum_{j=1}^{\infty}(j+1)p_{j+1}(t)+1\cdot p_1(t)]\\ &=\lambda\sum_{j=1}^{\infty}(j-1)p_{j-1}(t)-\mu\sum_{j=1}^{\infty}(j)p_{j}(t)\\ &=\lambda M(t)-\mu M(t) \end{split} \tag{13.18} \end{equation}\]

Thus, the differential equation is \[\begin{equation} M^{\prime}(t)=(\lambda-\mu)M(t) \tag{13.19} \end{equation}\] Using the same technique, we can finally get \[\begin{equation} \boxed{M(t)=M(0)e^{(\lambda-\mu)t}} \tag{13.20} \end{equation}\]

The result implies that, if \(\lambda>\mu\), then expected number of people witll increase over time, but if \(\lambda<\mu\), then the population will be extinct over time.

Let us check why MCMC works.

Suppose the parameter of interest \(\theta\in\Theta\) where \(\Theta\) is a discrete set. Let \(\pi(\theta)\) denote the prior distribution and the posterior distribution is given by \[\begin{equation} \pi(\theta|\mathbf{x})=\frac{f(\mathbf{x}|\theta)\pi(\theta)}{\sum_{\psi\in\Theta}f(\mathbf{x}|\psi)\pi(\psi)} \tag{13.21} \end{equation}\]

Let \(\boldsymbol{\pi}=(\pi(\theta|\mathbf{x}):\theta\in\Theta)=(\pi_1,\pi_2,\cdots)\). We construct a Markov chain \(\{X_n\}_{n\geq 1}\). Suppose \(X_n=i\), we want to define \(X_{n+1}\) using the following ingredients.

  1. Let \(H=(h_{ij}:i,j\in\Theta)\) be an arbitrary stochastic matrix, called the proposal matrix, such that \(P(Y=j|X_n=i)=h_{ij}\).

  2. Let the acceptance probability matrix \(\mathbf{A}=(a_{ij}:i,j\in\Theta)\) be a matrix with entries satisfying \(0\leq a_{ij}\leq 1\). \(a_{ij}\)s are called “acceptance probabilities”. \(a_{ij}\) is given by \[\begin{equation} a_{ij}=\min\{1,\frac{\pi_j h_{ji}}{\pi_i h_{ij}}\} \tag{13.22} \end{equation}\]

The algorithm is: given that \(Y=j\), we set \(X_{n+1}=\left\{\begin{aligned} & j & \text{with probability}\, a_{ij}\\ & X_n & \text{with probability}\, 1-a_{ij}\end{aligned}\right.\) Now we find the transition probability matrix \(P=(p_{ij}:i,j\in\Theta)\) where \(p_{ij}=P(X_{n+1}=j|X_n=i)\).

If \(i\neq j\), the proposal get accepted, then \[\begin{equation} \begin{split} p_{ij}&=P(Y=j|X_n=i)P(\text{proposal is accepted})\\ &=h_{ij}a_{ij} \end{split} \tag{13.23} \end{equation}\]

If \(i=j\), then \[\begin{equation} \begin{split} p_{ij}&=1-\sum_{k\neq i}p_{ik}\\ &=1-\sum_{k\neq i}h_{ik}a_{ik} \end{split} \tag{13.24} \end{equation}\]

Now we claim that, \(\pi_kp_{kj}=\pi_jp_{jk}\) for all \(j,k\in\Theta\), where \(\boldsymbol{\pi}\) is the posterior probabilities. We now prove it.

Firstly, if \(j\neq k\), the first case might be \(\pi_jh_{jk}<\pi_kh_{kj}\). In that case, using definition of \(a_{kj}\), we have \[\begin{equation} a_{kj}=\frac{\pi_jh_{jk}}{\pi_kh_{kj}} \tag{13.25} \end{equation}\] Therefore, by (13.23) and (13.25), \[\begin{equation} \begin{split} \pi_kp_{kj}&=\pi_kh_{kj}a_{kj}\\ &=\pi_kh_{kj}\frac{\pi_jh_{jk}}{\pi_kh_{kj}}\\ &=\pi_jh_{jk} \end{split} \tag{13.26} \end{equation}\] In addition, \[\begin{equation} \begin{split} \pi_jp_{jk}&=\pi_jh_{jk}a_{jk}\\ &=\pi_jh_{jk}=\pi_kp_{kj} \end{split} \tag{13.27} \end{equation}\] because \(a_{jk}=1\) by definition. The claim is satisfied.

Now, for another case \(\pi_jh_{jk}>\pi_kh_{kj}\), we will have \(a_{jk}=\frac{\pi_kh_{kj}}{\pi_jh_{jk}}\) by definition. Then \[\begin{equation} \pi_jp_{jk}=\pi_jh_{jk}a_{jk}=\pi_kh_{kj} \tag{13.28} \end{equation}\] Now \[\begin{equation} \pi_kp_{kj}=\pi_kh_{kj}a_{kj}=\pi_kh_{kj} \tag{13.29} \end{equation}\] beacuse \(a_{kj}=1\) by definition.

If we have a probability vector \(\boldsymbol{\pi}\) which satisfies detailed balance equation with transition probability of a Markov chain, then \(\boldsymbol{\pi}\) is the stationary distribution.Therefore, the full posterior distribution is the stationary distribution of the MCMC chain. Assuming all the other regularity conditions hold for the chain, then the stationary distribution is just the limiting distribution. Thus, the limiting distribution of the Markov chain is going to be the full posterior probabilities. Therefore, as you run the chain long enough, eventually you are going to hit the posterior distribution.