Chapter 10 Limiting Distribution of Markov Chain (Lecture on 02/04/2021)
Last class we start discussing the stationary distribution and the limiting distribution. This class wei will discuss lim for aperiodic chains.
Result:
When the chain is transient or null persistent, \mu_j=\infty, therefore p_{ij}(n)\to 0 as n\to\infty, \forall i,j.
If the chain is non-null persistent, p_{ij}(n)\to\frac{1}{\mu_j}=\pi_j. That is, p_{ij}(n) converges to the probability under the stationary distribution of going to state j.
\lim_{n\to\infty}p_{ij}(n) does not depend on the starting point i, which is technically stated as the chain forget its origin. It is now easy to check that \begin{equation} P(X_n=j)=\sum_{i}P(X_0=i)p_{ij}(n)\to\frac{1}{\mu_j}\sum_iP(X_0=i)=\frac{1}{\mu_j} \tag{10.1} \end{equation} as n\to\infty. This result tells us that, irrespective of the distribution of X_0, the marginal distribution of X_n converges to the stationary distribution for a non-null persistent state.
In the MCMC, we are looking for the limiting distribution of the chain. We run the chain long enough and we want it to go to the limiting distribution. When we do diagnostic of MCMC, we want to see if the starting point has influence on the limiting distribution. Typically if it a well-designed chain, the initial point should not have influence. The Gelman^{\prime}s diagnostics in MCMC requires us to start the chain in three different initial points and the chain should converge to the same place. This can happen because the Markov chain has this property.
Proposition 10.1 (Result for periodic chain) If X=\{X_n\}_{n\geq 1} is an irreducible chain with period d. Then Y=\{Y_n=X_{n_d}: n\geq 0\} is an irreducible aperiodic chain and it follows p_{jj}(nd)\to\frac{d}{\mu_j} as n\to\infty.
In general, if you have a irreducible periodic chain, you need to build an irreducible aperiodic chain from it, then apply the previous result to the aperiodic chain.
Definition 10.1 (Reversed Chain) Suppose \{X_n:0\leq n\leq N\} is an irreducible non-null persistent Markov chain, with transition matrix P and stationary distribution \boldsymbol{\pi}. Suppose further that X_n has a marginal distribution \boldsymbol{\pi} for all n.
Define the reversed chain Y by Y_n=X_{N-n} for 0\leq n\leq N.
Theorem 10.2 (Transition probability matrix of the reversed chain) Y=\{Y_n\}_{0\leq n\leq N} is a Markov chain with P(Y_{n+1}=j|Y_n=i)=\frac{\pi_jp_{ji}}{\pi_i}.
Proof. We have
\begin{equation}
\begin{split}
P&(Y_{n+1}=i_{n+1}|Y_n=i_n,Y_{n-1}=i_{n-1},\cdots,Y_0=i_0)\\
&=\frac{P(Y_k=i_k,0\leq k\leq n+1)}{P(Y_k=i_k,0\leq k\leq n)}\\
&=\frac{P(X_{N-(n+1)}=i_{n+1},X_{N-n}=i_n,\cdots,X_N=i_0)}{P(X_{N-n}=i_n,\cdots,X_N=i_0)}\\
&=\frac{\pi_{i_{n+1}}p_{i_{n+1},i_n}p_{i_n,i_{n-1}}\cdots p_{i_1,i_0}}{\pi_{i_{n}}p_{i_n,i_{n-1}}\cdots p_{i_1,i_0}}\\
&=\frac{\pi_{i_{n+1}}p_{i_{n+1},i_n}}{\pi_{i_n}}
\end{split}
\tag{10.2}
\end{equation}
Since conditional distribution depends on i_n only, it implies that \{Y_n\}_{0\leq n\leq N} is a Markov chain.
Definition 10.2 (Reversibility) Y=\{Y_n\}_{0\leq n\leq N} is called the time-reversal of chain X and we say X is reversible if X and Y has the same transition probability.
Definition 10.3 (Formal definition of reversible chain) Let X=\{X_n,0\leq n\leq N\} be an irreducible Markov chain such that X_n has the stationary distribution \boldsymbol{\pi} for all n. The chain is called reservible if the transition matrices of X and its time reversal chain Y are the same, which is to say p_{ij}=\frac{\pi_j p_{ji}}{\pi_i},\forall i,j or written as \begin{equation} \pi_ip_{ij}=\pi_jp_{ji},\forall i,j \tag{10.3} \end{equation}
The transition probability and the stationary distribution of a reversible chain are related through (10.3).Proof. We need to show \boldsymbol{\pi}=\boldsymbol{\pi}P, or equivalently, to show \pi_j=\sum_i\pi_ip_{ij},\forall i,j. However, from (10.3), \begin{equation} \sum_i\pi_ip_{ij}=\sum_i\pi_jp_{ji}=\pi_j\sum_ip_{ji}=\pi_j \tag{10.4} \end{equation}
Therefore, \pi is the stationary distribution.The Metropolis-Hasting chain satisfies this reversible property, with the stationary distribution to be the full posterior distribution. That is why Metropolis-Hasting works.
Example 10.1 (Ehrenfest model of diffusion) Suppose the box is partitioned into two sub-boxes, A and B. A total of m balls (gas molecules) are in A and B. At each time, you choose one ball at random from m balls and change its box. Let X_n be the number of balls in box A at time n. We have \begin{equation} \begin{split} &P(X_{n+1}=i+1|X_n=i)=\frac{m-i}{m}=p_{i,i+1}\\ &P(X_{n+1}=i-1|X_n=i)=\frac{i}{m}=p_{i,i-1}\\ \end{split} \tag{10.5} \end{equation} Also notice that p_{i,j}=0 for all j such that |j-i|\geq 2, for 0\leq i\leq m.
Claim: \pi_i={m \choose i}(\frac{1}{2})^m for i=0,\cdots,m.
Note that \pi_ip_{i,i+1}={m \choose i}(\frac{1}{2})^m\frac{m-i}{m}=\frac{m!}{i!(m-i)!}(\frac{1}{2})^m\frac{m-i}{m}=\frac{(m-1)!}{i!(m-i-1)!}(\frac{1}{2})^m. Similarly, \pi_{i+1}p_{i+1,i}={m \choose {i+1}}(\frac{1}{2})^m\frac{i+1}{m}=\frac{m!}{(i+1)!(m-i-1)!}(\frac{1}{2})^m\frac{i+1}{m}=\frac{(m-1)!}{i!(m-i-1)!}(\frac{1}{2})^m. Therefore, \pi_ip_{i,i+1}=\pi_{i+1}p_{i+1,i}. Also p_{ij}=0 for all |i-j|\geq 2. We finally have \pi_ip_{ij}=\pi_jp_{ji},\forall i,j.
Therefore, using Proposition 10.2, \boldsymbol{\pi} is the stationary distribution of the Markov chain.
Example 10.2 (Gambler^{\prime}s ruin problem) A gambler starts with some money and keep on gambling. At each gamble, the gambler win \$ 1 with probability p and lose \$ 1 with probability q=1-p. If the gambler wins \$ N, he wins the game. The game ends and he has all the money. If he loses all his money, he is broken and the game also ends.
Question: What is the porbability of losing (or wining) the entire money when the gambler starts with \$ i.
Let X_n be the amount of money the gambler has at time n. The transition matrix P\in\mathbb{R}^{N\times N} will look like \begin{equation} P=\begin{pmatrix} 1 & 0 & \cdots & \cdots & \cdots & \cdots & 0 \\ q & 0 & p & 0 & 0 & \cdots & 0 \\ 0 & q & 0 & p & 0 & \cdots & 0 \\ \vdots & \vdots & & & & & \vdots\\ 0 & \cdots & \cdots & \cdots & q & 0 & p \\ 0 & \cdots & \cdots & \cdots & \cdots & 0 & 1 \end{pmatrix} \tag{10.6} \end{equation}
Now define u_i to be the probability of losing the game when the gambler starts with \$ i. Also define v_ito be the probability of wining the game when the gambler starts with \$ i. We have u_i=1-v_i so we only need to find one of them. Let us find u_i.
We have
\begin{equation}
\begin{split}
u_i&=P(\text{chain entering state 0}|\text{start at state i})\\
&=P(\text{chain entering state 0}|\text{start at state i},\text{the first gamble is won})\\
&\times P(\text{the first gamble is won})\\
&+P(\text{chain entering state 0}|\text{start at state i},\text{the first gamble is lost})\\
&\times P(\text{the first gamble is lost})\\
&=P((\text{chain entering state 0}|\text{chain start at state i+1})P(\text{the first gamble is won})\\
&+ P((\text{chain entering state 0}|\text{chain start at state i-1})P(\text{the first gamble is lost})\\
&=u_{i+1}p+u_{i-1}q
\end{split}
\tag{10.7}
\end{equation}
for 1\leq i\leq N-1. We also have boundary conditions u_N=0 and u_0=1.
Since u_i=u_{i+1}p+u_{i-1}q, we can write it as (p+q)u_i=u_{i+1}p+u_{i-1}q, or p(u_{i+1}-u_i)=q(u_i-u_{i-1}). That is, u_{i+1}-u_i=\frac{q}{p}(u_i-u_{i-1}). Keep using this recuresive relationship, we have \begin{equation} u_{i+1}-u_i=\frac{q}{p}(u_i-u_{i-1})=\cdots=(\frac{q}{p})^i(u_1-u_0) \tag{10.8} \end{equation} Thus, we can have sequence of equations like (10.8) as follows: \begin{equation} \begin{split} &u_{i+1}-u_i=(\frac{q}{p})^i(u_1-u_0)\quad\quad\quad (1)\\ &u_{i}-u_{i-1}=(\frac{q}{p})^{i-1}(u_1-u_0)\quad\quad\quad (2)\\ &\cdot\cdots\cdots\\ &u_{2}-u_1=(\frac{q}{p})(u_1-u_0)\quad\quad\quad (i)\\ \end{split} \end{equation}
Summing (1) to (i), we obtain
\begin{equation}
u_{i+1}-u_1=[(\frac{q}{p})^i+(\frac{q}{p})](u_1-u_0)
\tag{10.9}
\end{equation}
or by adding (u_1-u_0) on both side of (10.9) we have
\begin{equation}
(u_{i+1}-u_1)+(u_1-u_0)=[(\frac{q}{p})^i+(\frac{q}{p})+1](u_1-u_0)
\tag{10.10}
\end{equation}
Assume p\neq q, we get
\begin{equation}
u_{i+1}-u_0=\frac{(\frac{q}{p})^{i+1}-1}{\frac{q}{p}-1}(u_1-u_0)
\tag{10.11}
\end{equation}
Let i=N-1 we obtain
\begin{equation}
u_{N}-u_0=\frac{(\frac{q}{p})^{N}-1}{\frac{q}{p}-1}(u_1-u_0)
\tag{10.12}
\end{equation}
By boundary condition, u_N=0 and u_0=1. Thus,
\begin{equation}
u_0-u_1=\frac{\frac{q}{p}-1}{(\frac{q}{p})^N-1}
\tag{10.13}
\end{equation}
Plugging in the value of u_1-u_0 in earlier equations, we obtain
\begin{equation}
u_i-u_0=\frac{(\frac{q}{p})^{i}-1}{\frac{q}{p}-1}\{-\frac{\frac{q}{p}-1}{(\frac{q}{p})^N-1}\}=-\frac{(\frac{q}{p})^{i}-1}{(\frac{q}{p})^N-1}
\tag{10.14}
\end{equation}
Since u_0=1, we finally have
\begin{equation}
u_i=1-\frac{(\frac{q}{p})^{i}-1}{(\frac{q}{p})^N-1}
\tag{10.15}
\end{equation}
When q=p, go back to equation (10.8), we have the recursive relationship as u_{i}-u_{i-1}=u_{i+1}-u_i. Therefore, u_i-u_0=i(u_1-u_0). Letting i=N, u_N-u_0=N(u_1-u_0). Using boundary conditions, we have -1=N(u_1-u_0) or u_1-u_0=1-\frac{1}{N}. Hence we can obtian u_i=1-\frac{i}{N}.
We have found that \begin{equation} \boxed{u_i=\left\{\begin{aligned} & 1-\frac{(\frac{q}{p})^{i}-1}{(\frac{q}{p})^N-1} & q\neq p\\ & 1-\frac{i}{N} & p=q \end{aligned}\right.} \tag{10.16} \end{equation}