Chapter 10 Limiting Distribution of Markov Chain (Lecture on 02/04/2021)
Last class we start discussing the stationary distribution and the limiting distribution. This class wei will discuss \(\lim_{n\to\infty}p_{ij}(n)\) for aperiodic chains.
Result:
When the chain is transient or null persistent, \(\mu_j=\infty\), therefore \(p_{ij}(n)\to 0\) as \(n\to\infty\), \(\forall i,j\).
If the chain is non-null persistent, \(p_{ij}(n)\to\frac{1}{\mu_j}=\pi_j\). That is, \(p_{ij}(n)\) converges to the probability under the stationary distribution of going to state \(j\).
\(\lim_{n\to\infty}p_{ij}(n)\) does not depend on the starting point \(i\), which is technically stated as the chain forget its origin. It is now easy to check that \[\begin{equation} P(X_n=j)=\sum_{i}P(X_0=i)p_{ij}(n)\to\frac{1}{\mu_j}\sum_iP(X_0=i)=\frac{1}{\mu_j} \tag{10.1} \end{equation}\] as \(n\to\infty\). This result tells us that, irrespective of the distribution of \(X_0\), the marginal distribution of \(X_n\) converges to the stationary distribution for a non-null persistent state.
In the MCMC, we are looking for the limiting distribution of the chain. We run the chain long enough and we want it to go to the limiting distribution. When we do diagnostic of MCMC, we want to see if the starting point has influence on the limiting distribution. Typically if it a well-designed chain, the initial point should not have influence. The Gelman\(^{\prime}\)s diagnostics in MCMC requires us to start the chain in three different initial points and the chain should converge to the same place. This can happen because the Markov chain has this property.
Proposition 10.1 (Result for periodic chain) If \(X=\{X_n\}_{n\geq 1}\) is an irreducible chain with period \(d\). Then \(Y=\{Y_n=X_{n_d}: n\geq 0\}\) is an irreducible aperiodic chain and it follows \(p_{jj}(nd)\to\frac{d}{\mu_j}\) as \(n\to\infty\).
In general, if you have a irreducible periodic chain, you need to build an irreducible aperiodic chain from it, then apply the previous result to the aperiodic chain.
Definition 10.1 (Reversed Chain) Suppose \(\{X_n:0\leq n\leq N\}\) is an irreducible non-null persistent Markov chain, with transition matrix \(P\) and stationary distribution \(\boldsymbol{\pi}\). Suppose further that \(X_n\) has a marginal distribution \(\boldsymbol{\pi}\) for all \(n\).
Define the reversed chain \(Y\) by \(Y_n=X_{N-n}\) for \(0\leq n\leq N\).
Theorem 10.2 (Transition probability matrix of the reversed chain) \(Y=\{Y_n\}_{0\leq n\leq N}\) is a Markov chain with \(P(Y_{n+1}=j|Y_n=i)=\frac{\pi_jp_{ji}}{\pi_i}\).
Proof. We have
\[\begin{equation}
\begin{split}
P&(Y_{n+1}=i_{n+1}|Y_n=i_n,Y_{n-1}=i_{n-1},\cdots,Y_0=i_0)\\
&=\frac{P(Y_k=i_k,0\leq k\leq n+1)}{P(Y_k=i_k,0\leq k\leq n)}\\
&=\frac{P(X_{N-(n+1)}=i_{n+1},X_{N-n}=i_n,\cdots,X_N=i_0)}{P(X_{N-n}=i_n,\cdots,X_N=i_0)}\\
&=\frac{\pi_{i_{n+1}}p_{i_{n+1},i_n}p_{i_n,i_{n-1}}\cdots p_{i_1,i_0}}{\pi_{i_{n}}p_{i_n,i_{n-1}}\cdots p_{i_1,i_0}}\\
&=\frac{\pi_{i_{n+1}}p_{i_{n+1},i_n}}{\pi_{i_n}}
\end{split}
\tag{10.2}
\end{equation}\]
Since conditional distribution depends on \(i_n\) only, it implies that \(\{Y_n\}_{0\leq n\leq N}\) is a Markov chain.
Definition 10.2 (Reversibility) \(Y=\{Y_n\}_{0\leq n\leq N}\) is called the time-reversal of chain \(X\) and we say \(X\) is reversible if \(X\) and \(Y\) has the same transition probability.
Definition 10.3 (Formal definition of reversible chain) Let \(X=\{X_n,0\leq n\leq N\}\) be an irreducible Markov chain such that \(X_n\) has the stationary distribution \(\boldsymbol{\pi}\) for all \(n\). The chain is called reservible if the transition matrices of \(X\) and its time reversal chain \(Y\) are the same, which is to say \(p_{ij}=\frac{\pi_j p_{ji}}{\pi_i},\forall i,j\) or written as \[\begin{equation} \pi_ip_{ij}=\pi_jp_{ji},\forall i,j \tag{10.3} \end{equation}\]
The transition probability and the stationary distribution of a reversible chain are related through (10.3).Proof. We need to show \(\boldsymbol{\pi}=\boldsymbol{\pi}P\), or equivalently, to show \(\pi_j=\sum_i\pi_ip_{ij},\forall i,j\). However, from (10.3), \[\begin{equation} \sum_i\pi_ip_{ij}=\sum_i\pi_jp_{ji}=\pi_j\sum_ip_{ji}=\pi_j \tag{10.4} \end{equation}\]
Therefore, \(\pi\) is the stationary distribution.The Metropolis-Hasting chain satisfies this reversible property, with the stationary distribution to be the full posterior distribution. That is why Metropolis-Hasting works.
Example 10.1 (Ehrenfest model of diffusion) Suppose the box is partitioned into two sub-boxes, A and B. A total of \(m\) balls (gas molecules) are in A and B. At each time, you choose one ball at random from \(m\) balls and change its box. Let \(X_n\) be the number of balls in box \(A\) at time \(n\). We have \[\begin{equation} \begin{split} &P(X_{n+1}=i+1|X_n=i)=\frac{m-i}{m}=p_{i,i+1}\\ &P(X_{n+1}=i-1|X_n=i)=\frac{i}{m}=p_{i,i-1}\\ \end{split} \tag{10.5} \end{equation}\] Also notice that \(p_{i,j}=0\) for all \(j\) such that \(|j-i|\geq 2\), for \(0\leq i\leq m\).
Claim: \(\pi_i={m \choose i}(\frac{1}{2})^m\) for \(i=0,\cdots,m\).
Note that \(\pi_ip_{i,i+1}={m \choose i}(\frac{1}{2})^m\frac{m-i}{m}=\frac{m!}{i!(m-i)!}(\frac{1}{2})^m\frac{m-i}{m}=\frac{(m-1)!}{i!(m-i-1)!}(\frac{1}{2})^m\). Similarly, \(\pi_{i+1}p_{i+1,i}={m \choose {i+1}}(\frac{1}{2})^m\frac{i+1}{m}=\frac{m!}{(i+1)!(m-i-1)!}(\frac{1}{2})^m\frac{i+1}{m}=\frac{(m-1)!}{i!(m-i-1)!}(\frac{1}{2})^m\). Therefore, \(\pi_ip_{i,i+1}=\pi_{i+1}p_{i+1,i}\). Also \(p_{ij}=0\) for all \(|i-j|\geq 2\). We finally have \(\pi_ip_{ij}=\pi_jp_{ji},\forall i,j\).
Therefore, using Proposition 10.2, \(\boldsymbol{\pi}\) is the stationary distribution of the Markov chain.
Example 10.2 (Gambler\(^{\prime}\)s ruin problem) A gambler starts with some money and keep on gambling. At each gamble, the gambler win \(\$ 1\) with probability \(p\) and lose \(\$ 1\) with probability \(q=1-p\). If the gambler wins \(\$ N\), he wins the game. The game ends and he has all the money. If he loses all his money, he is broken and the game also ends.
Question: What is the porbability of losing (or wining) the entire money when the gambler starts with \(\$ i\).
Let \(X_n\) be the amount of money the gambler has at time \(n\). The transition matrix \(P\in\mathbb{R}^{N\times N}\) will look like \[\begin{equation} P=\begin{pmatrix} 1 & 0 & \cdots & \cdots & \cdots & \cdots & 0 \\ q & 0 & p & 0 & 0 & \cdots & 0 \\ 0 & q & 0 & p & 0 & \cdots & 0 \\ \vdots & \vdots & & & & & \vdots\\ 0 & \cdots & \cdots & \cdots & q & 0 & p \\ 0 & \cdots & \cdots & \cdots & \cdots & 0 & 1 \end{pmatrix} \tag{10.6} \end{equation}\]
Now define \(u_i\) to be the probability of losing the game when the gambler starts with \(\$ i\). Also define \(v_i\)to be the probability of wining the game when the gambler starts with \(\$ i\). We have \(u_i=1-v_i\) so we only need to find one of them. Let us find \(u_i\).
We have
\[\begin{equation}
\begin{split}
u_i&=P(\text{chain entering state 0}|\text{start at state i})\\
&=P(\text{chain entering state 0}|\text{start at state i},\text{the first gamble is won})\\
&\times P(\text{the first gamble is won})\\
&+P(\text{chain entering state 0}|\text{start at state i},\text{the first gamble is lost})\\
&\times P(\text{the first gamble is lost})\\
&=P((\text{chain entering state 0}|\text{chain start at state i+1})P(\text{the first gamble is won})\\
&+ P((\text{chain entering state 0}|\text{chain start at state i-1})P(\text{the first gamble is lost})\\
&=u_{i+1}p+u_{i-1}q
\end{split}
\tag{10.7}
\end{equation}\]
for \(1\leq i\leq N-1\). We also have boundary conditions \(u_N=0\) and \(u_0=1\).
Since \(u_i=u_{i+1}p+u_{i-1}q\), we can write it as \((p+q)u_i=u_{i+1}p+u_{i-1}q\), or \(p(u_{i+1}-u_i)=q(u_i-u_{i-1})\). That is, \(u_{i+1}-u_i=\frac{q}{p}(u_i-u_{i-1})\). Keep using this recuresive relationship, we have \[\begin{equation} u_{i+1}-u_i=\frac{q}{p}(u_i-u_{i-1})=\cdots=(\frac{q}{p})^i(u_1-u_0) \tag{10.8} \end{equation}\] Thus, we can have sequence of equations like (10.8) as follows: \[\begin{equation} \begin{split} &u_{i+1}-u_i=(\frac{q}{p})^i(u_1-u_0)\quad\quad\quad (1)\\ &u_{i}-u_{i-1}=(\frac{q}{p})^{i-1}(u_1-u_0)\quad\quad\quad (2)\\ &\cdot\cdots\cdots\\ &u_{2}-u_1=(\frac{q}{p})(u_1-u_0)\quad\quad\quad (i)\\ \end{split} \end{equation}\]
Summing (1) to (i), we obtain
\[\begin{equation}
u_{i+1}-u_1=[(\frac{q}{p})^i+(\frac{q}{p})](u_1-u_0)
\tag{10.9}
\end{equation}\]
or by adding \((u_1-u_0)\) on both side of (10.9) we have
\[\begin{equation}
(u_{i+1}-u_1)+(u_1-u_0)=[(\frac{q}{p})^i+(\frac{q}{p})+1](u_1-u_0)
\tag{10.10}
\end{equation}\]
Assume \(p\neq q\), we get
\[\begin{equation}
u_{i+1}-u_0=\frac{(\frac{q}{p})^{i+1}-1}{\frac{q}{p}-1}(u_1-u_0)
\tag{10.11}
\end{equation}\]
Let \(i=N-1\) we obtain
\[\begin{equation}
u_{N}-u_0=\frac{(\frac{q}{p})^{N}-1}{\frac{q}{p}-1}(u_1-u_0)
\tag{10.12}
\end{equation}\]
By boundary condition, \(u_N=0\) and \(u_0=1\). Thus,
\[\begin{equation}
u_0-u_1=\frac{\frac{q}{p}-1}{(\frac{q}{p})^N-1}
\tag{10.13}
\end{equation}\]
Plugging in the value of \(u_1-u_0\) in earlier equations, we obtain
\[\begin{equation}
u_i-u_0=\frac{(\frac{q}{p})^{i}-1}{\frac{q}{p}-1}\{-\frac{\frac{q}{p}-1}{(\frac{q}{p})^N-1}\}=-\frac{(\frac{q}{p})^{i}-1}{(\frac{q}{p})^N-1}
\tag{10.14}
\end{equation}\]
Since \(u_0=1\), we finally have
\[\begin{equation}
u_i=1-\frac{(\frac{q}{p})^{i}-1}{(\frac{q}{p})^N-1}
\tag{10.15}
\end{equation}\]
When \(q=p\), go back to equation (10.8), we have the recursive relationship as \(u_{i}-u_{i-1}=u_{i+1}-u_i\). Therefore, \(u_i-u_0=i(u_1-u_0)\). Letting \(i=N\), \(u_N-u_0=N(u_1-u_0)\). Using boundary conditions, we have \(-1=N(u_1-u_0)\) or \(u_1-u_0=1-\frac{1}{N}\). Hence we can obtian \(u_i=1-\frac{i}{N}\).
We have found that \[\begin{equation} \boxed{u_i=\left\{\begin{aligned} & 1-\frac{(\frac{q}{p})^{i}-1}{(\frac{q}{p})^N-1} & q\neq p\\ & 1-\frac{i}{N} & p=q \end{aligned}\right.} \tag{10.16} \end{equation}\]