Chapter 8 Stochastic Process and Markov Chains

8.1 Strochastic Processes

A stochastic process is a family of random variables {Xt:tT}{Xt:tT} where the index tt belongs to some parameter set TT, and each of the random variables can take the same set of possible values. The possible values that the random variables can take is known as the state space.

Typically the indices tt and parameter set TT have a time-related interpretation. With this in mind, informally a stochastic process can be thought of as the state of some system at a time tt. When the set TT is discrete, for example T={0,1,2,3,4,5,}T={0,1,2,3,4,5,}, then the stochastic process {Xt:tT}{Xt:tT} is known as a discrete-time process. When the set TT is continuous, for example T=[0,)T=[0,), then the stochastic process {Xt:tT}{Xt:tT} is known as a continuous-time process. In DATA2002, we will predominantly consider stochastic processes that are discrete time processes.

Furthermore throughout the remainder of DATA2002, we will only consider stochastic processes for which the state space is discrete, that is, each of the XtXt are discrete random variables.

Consider a fair 66-sided die that we roll repeatedly keeping track of our total cumulative score. Let XtXt be the total of the first tt dice rolls. What is the parameter set TT, and what is the state space denoted SS. Also calculate the PMFs of X1X1 and X2X2.



The index tt corresponds the number of dice rolls that have been made, and so will be a positive whole integer. Therefore T={1,2,3,4,5,}.T={1,2,3,4,5,}. After nn dice rolls, the cumulative total XnXn will be an integer between nn and 6n6n. All positive integers will be included in at least one these intervals, and since all random variables must share a common state space, it is necessary to take S={1,2,3,4,5,}.S={1,2,3,4,5,}. The cumulative total score after one die roll, that is the PMF ρX1ρX1 of X1X1 is given by ρX1(x)={16,if x=1,2,3,4,5,6,0,otherwise.ρX1(x)={16,if x=1,2,3,4,5,6,0,otherwise.

The possible outcomes for rolling the die twice are summarized by the following table.

Therefore the cumulative total score after two dice rolls, that is the PMF ρX2ρX2 of X2X2 is given by ρX2(x)={136,if x=2,12,118,if x=3,11,112,if x=4,10,19,if x=5,9,536,if x=6,8,16,if x=7,0,otherwise.ρX2(x)=⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪ ⎪136,if x=2,12,118,if x=3,11,112,if x=4,10,19,if x=5,9,536,if x=6,8,16,if x=7,0,otherwise.

Note from Example 8.1.2 that the random variable XtXt having an element ss in the state space SS is not the same as there being a non-zero probability that Xt=sXt=s. For example P(X1=7)=0P(X1=7)=0 since it is impossible to have a total of 77 after just one roll, but 77 does remain in the state space SS of X1X1.

Deja Brew and Latte Da are the two chains of cafes in Nottingham. Mary Berry travels to Nottingham and decides to try out Latte Da. Mary then falls into a habit: if she has been to Latte Da there is a 5656 chance she chooses to go to Deja Brew on her next trip to Nottingham, and a 1616 chance she will go back to Latte Da. Alternatively if Mary has been to Deja Brew, then there is a 1313 chance that she decides to go Latte Da on her next day out in Nottingham, and a 2323 she returns to Deja Brew.

This can be summarised by the following diagram:

Mary Berrys’ behaviour can be modeled by a stochastic process. Let S={Latte Da, Deja Brew}S={Latte Da, Deja Brew} be the state space, and XtXt indicate whether Mary goes to Latte Da or Deja Brew on her tthtth subsequent trip to Nottingham, with tT=Z>0tT=Z>0.

A sample path is a sequence of realised values for the random variables XtXt of a stochastic process.

Consider again the random variables XtXt of Example 8.1.2, obtained by calculating the cumulative score of nn rolls of a fair 66-sided die. The following table outlines examples of sample paths ω1,ω2,ω3,ω4,ω5ω1,ω2,ω3,ω4,ω5 observed for the stochastic process:

For ω1ω1, we rolled 4,3,3,6,1,1,4,3,3,6,1,1, successively.

Using the stochastic process of Example 8.1.2, what is P(X1=1,X4=4)P(X1=1,X4=4)?

Using the stochastic process of Example 8.1.2, what is P(Xt=3 for some tT)P(Xt=3 for some tT)?

Using the stochastic process of Example 8.1.2, what is P(Xt10 for all tT)P(Xt10 for all tT)?

These are all examples of questions that we will learn to answer.

What should also be noted in Example 8.1.2 is that in performing the underlying experiment to observe X2X2, we necessarily observe X1X1 (before we can roll the die twice, we have to roll it once). Therefore it could be argued that it makes more natural sense to study ρX1ρX1 and ρX2X1ρX2X1, rather than ρX1ρX1 and ρX2ρX2. This argument is true more generally since the parameter set TT typically has a time-related or sequential interpretation: we can study ρX1,ρX2X1,ρX3X1,X2,ρXn+1X1,X2,,XnρX1,ρX2X1,ρX3X1,X2,ρXn+1X1,X2,,Xn.

This highlights a key feature of stochastic processes: the transitions from one state to another state.

8.2 Defintion of Markov Chains

A Markov Chain is a type of stochastic processes.

A stochastic process {Xt:t0}{Xt:t0} is a Markov chain if P(Xt+1=xt+1X0=x0,,Xt=xt)=P(Xt+1=xt+1Xt=xt).P(Xt+1=xt+1X0=x0,,Xt=xt)=P(Xt+1=xt+1Xt=xt). for all t0t0 and states x0,x1,,xn,xn+1x0,x1,,xn,xn+1.

This is to say, that when trying to predict the future state of a Markov chain it is only important to know the present state and not how the process arrived at this state.

The stochastic processes in Example 8.1.2 and Example 8.1.3 were Markov Chains.

Set X0=X1=1X0=X1=1. To define XtXt for t2t2, one tosses a coin. If a head is obtained then set Xt=Xt1+Xt2Xt=Xt1+Xt2, and if a tails is obtained set Xt=0Xt=0. Explain why {Xt:t0}{Xt:t0} is not a Markov chain.

In the event of tossing a head clearly XtXt is dependent on both the value of Xt1Xt1 and Xt2Xt2, not just the current state Xt1Xt1. This is in contradiction to Definition 8.2.1 of a Markov chain.

Identifying Markov chains within your workplace is a key skill for DATA2002. Many Markov chains will have a great number of states.

A Markov chain {Xt:t0}{Xt:t0} is called time-homogeneous if P(Xt+1=jXt=i)=P(X1=jX0=i)P(Xt+1=jXt=i)=P(X1=jX0=i) for all t0t0 and for all i,jSi,jS.

Informally time-homogeneouity means that the probabilities regarding transitions between states do not change with time. In this module, we only consider time-homogeneous Markov chains.

Clearly Example 8.1.3 is a time-homogeneous Markov chain.

Consider a time-homogeneous Markov chain {Xt:t0}{Xt:t0}. The transition probabilities are the values pij=P(X1=jX0=i).pij=P(X1=jX0=i).

8.3 Transition Matrix

The transition probabilities introduced in Definition 8.2.4 play a key role in the analysis of Markov chains. As an efficient storing technique we introduce the transition probability matrix.

The |S|×|S||S|×|S| matrix P=(pij)P=(pij), where the entry in the ithith row and jthjth column is the transition probability pi,jpi,j, is called the transition probability matrix.

Note that any transition probability matrix P=(pij)P=(pij):

  • pij0pij0 for all i,jSi,jS, or equivalently every matrix entry is non-negative;

  • jSpij=1jSpij=1 for all iSiS, or equivalently the entries in each row of the matrix sum to 11.

Any matrix that satisfies both of these conditions is called a stochastic matrix.

What is the transition probability matrix PP of the stochastic process in Example 8.1.3?



Allowing the index 1 to correspond to Latte Da and index 2 correspond to Deja Brew, from the description of the problem we have p11=P(Xt+1=Latte Da=1Xt=Latte Da=1)=16,p12=P(Xt+1=Deja Brew=2Xt=Latte Da=1)=56,p21=P(Xt+1=Latte Da=1Xt=Deja Brew=2)=13,p22=P(Xt+1=Deja Brew=2Xt=Deja Brew=2)=23.p11=P(Xt+1=Latte Da=1Xt=Latte Da=1)=16,p12=P(Xt+1=Deja Brew=2Xt=Latte Da=1)=56,p21=P(Xt+1=Latte Da=1Xt=Deja Brew=2)=13,p22=P(Xt+1=Deja Brew=2Xt=Deja Brew=2)=23.

Therefore the transition probability matrix is given by P=(16561323).P=(16561323).
What is the transition probability matrix PP of the stochastic process in Example 8.1.2?

Suppose that Xt=nXt=n for some positive integer nn. Then since the fair die has an equal chance of rolling each of 1,2,3,4,5,61,2,3,4,5,6 there is a 1616 chance that Xt+1Xt+1 is equal to each of n+1n+1, n+2n+2, n+3n+3, n+4n+4, n+5n+5, n+6n+6. Specifically pn,n+1=P(Xt+1=n+1Xt=n)=16,pn,n+2=P(Xt+1=n+2Xt=n)=16,pn,n+3=P(Xt+1=n+3Xt=n)=16,pn,n+4=P(Xt+1=n+4Xt=n)=16,pn,n+5=P(Xt+1=n+5Xt=n)=16,pn,n+6=P(Xt+1=n+6Xt=n)=16,pn,j=P(Xt+1=jXt=n)=0,pn,n+1=P(Xt+1=n+1Xt=n)=16,pn,n+2=P(Xt+1=n+2Xt=n)=16,pn,n+3=P(Xt+1=n+3Xt=n)=16,pn,n+4=P(Xt+1=n+4Xt=n)=16,pn,n+5=P(Xt+1=n+5Xt=n)=16,pn,n+6=P(Xt+1=n+6Xt=n)=16,pn,j=P(Xt+1=jXt=n)=0, where jn+1,n+2,n+3,n+4,n+5,n+6jn+1,n+2,n+3,n+4,n+5,n+6. Therefore the transition probability matrix is given by P=(01616161616160000000161616161616000000016161616161600000001616161616160000000161616161616000000016161616161600000001616161616000000001616161600000000016161600000000001616).P=⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜01616161616160000000161616161616000000016161616161600000001616161616160000000161616161616000000016161616161600000001616161616000000001616161600000000016161600000000001616⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟.

This notion of transition probabilities can extended from one-step transitions to nn-step transitions. Specifically set p(n)ij=P(Xn=jX0=i).p(n)ij=P(Xn=jX0=i). Note that due to the Markov chains we are considering being time-homogeneous, we have P(Xn=jX0=i)=P(Xt+n=jXt=i).P(Xn=jX0=i)=P(Xt+n=jXt=i). This is to say, only the number of steps between our known state and the future event we wish to find the probability of matters, not the time at which this is happening.

The |S|×|S||S|×|S| matrix P(n)=(p(n)ij)P(n)=(p(n)ij), where the entry in the ithith row and jthjth column is the nn-step transition probability p(n)ijp(n)ij, is called the n-step transition probability matrix.

What is the 22-step transition probability matrix P(2)P(2) of the stochastic process in Example 8.1.3.



Again allowing the index 1 to correspond to Latte Da and index 2 correspond to Deja Brew, from the description of the problem we can calculate p(2)11=P(Xt+2=1Xt=1)=P(Xt+2=1,Xt+1=1Xt=1)+(Xt+2=1,Xt+1=2Xt=1)=1616+5613=1136,p(2)12=P(Xt+2=2Xt=1)=P(Xt+2=2,Xt+1=1Xt=1)+(Xt+2=2,Xt+1=2Xt=1)=1656+5623=2536,p(2)21=P(Xt+2=1Xt=2)=P(Xt+2=1,Xt+1=1Xt=2)+(Xt+2=1,Xt+1=2Xt=2)=1316+2313=518,p(2)22=P(Xt+2=2Xt=2)=P(Xt+2=2,Xt+1=1Xt=2)+(Xt+2=2,Xt+1=2Xt=2)=1356+2323=1318.p(2)11=P(Xt+2=1Xt=1)=P(Xt+2=1,Xt+1=1Xt=1)+(Xt+2=1,Xt+1=2Xt=1)=1616+5613=1136,p(2)12=P(Xt+2=2Xt=1)=P(Xt+2=2,Xt+1=1Xt=1)+(Xt+2=2,Xt+1=2Xt=1)=1656+5623=2536,p(2)21=P(Xt+2=1Xt=2)=P(Xt+2=1,Xt+1=1Xt=2)+(Xt+2=1,Xt+1=2Xt=2)=1316+2313=518,p(2)22=P(Xt+2=2Xt=2)=P(Xt+2=2,Xt+1=1Xt=2)+(Xt+2=2,Xt+1=2Xt=2)=1356+2323=1318.

Therefore the 22-step transition probability matrix is given by P(2)=(113625365181318).P(2)=(113625365181318).

The following theorem gives us an easy way to calculate nn-step transition probability matrix.

Chapman-Kolmogorov Equations: The transition probability matrix PP and the nn-step transition probability matrix P(n)P(n) satisfy the equation P(n)=Pn.P(n)=Pn.

What is the 22-step transition probability matrix P(2)P(2) of the stochastic process in Example 8.1.2.

By the Chapman-Kolmogorov equations, the question is equivalent to calculating P2P2 where PP is the transition probability matrix calculated in Example 8.3.3. Specifically P(2)=P2=(01616161616160000000161616161616000000016161616161600000001616161616160000000161616161616000000016161616161600000001616161616000000001616161600000000016161600000000001616)2=(00136118112195361653619112118136000001361181121953616536191121181360000013611811219536165361911211813600000136118112195361653619112118000000136118112195361653619112000000013611811219536165361900000000136118112195361653600000000013611811219536160000000000136118112195360000000000013611811219)P(2)=P2=⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜01616161616160000000161616161616000000016161616161600000001616161616160000000161616161616000000016161616161600000001616161616000000001616161600000000016161600000000001616⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟2=⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜ ⎜00136118112195361653619112118136000001361181121953616536191121181360000013611811219536165361911211813600000136118112195361653619112118000000136118112195361653619112000000013611811219536165361900000000136118112195361653600000000013611811219536160000000000136118112195360000000000013611811219⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟ ⎟
What is the nn-step transition probability matrix P(n)P(n) of the stochastic process in Example 8.1.3.



By the Chapman-Kolmogorov equations, the question is equivalent to calculating PnPn where PP is the transition probability matrix calculated in Example 8.3.2. Specifically P(n)=Pn=(16561323)n.P(n)=Pn=(16561323)n.

To calculate this high order of PP, we first diagonalise: recall from Mathematics for Data Science that P=MDM1P=MDM1 where DD is the diagonal matrix of eigenvalues of PP, and MM is the matrix whose ithith column is the eigenvector of PP corresponding to the ithith eigenvalue in DD. For PP, we can check that D=(10016)andM=(1512).D=(10016)andM=(1512). It follows that P(n)=Pn=(MDM1)(MDM1)(MDM1)=MDnM1=(1512)(10016)n(27571717)=(1512)(100(16)n)(27571717)=(15(16)n12(16)n)(27571717)=(27+57(16)n5757(16)n2727(16)n57+27(16)n)P(n)=Pn=(MDM1)(MDM1)(MDM1)=MDnM1=(1512)(10016)n(27571717)=(1512)(100(16)n)(27571717)=⎜ ⎜15(16)n12(16)n⎟ ⎟(27571717)=⎜ ⎜27+57(16)n5757(16)n2727(16)n57+27(16)n⎟ ⎟

Approximate the probability that Mary Berry goes to Deja Brew on some trip infinitely far in the future.

8.4 Hitting Probabilities and Hitting Times

Given a Markov chain and a current state, one can ask about the probabilities of ever achieving other states.

Given two distinct states i,ji,j of a Markov chain, the hitting probability denoted fijfij is the probability of ever reaching state jj in the future given that the current state is ii. Mathematically fij=P(Xt=j for some t1X0=i).fij=P(Xt=j for some t1X0=i).

Often the best way to calculate fijfij, is to split the possible sample path into cases based on the first step. Specifically P(Xt=j for some t1X0=i)=kSP(Xt=j for some t1, and X1=kX0=i).P(Xt=j for some t1X0=i)=kSP(Xt=j for some t1, and X1=kX0=i). This right hand side of this equation can then be conditioned using the general probability identity P(AB)=P(AB)P(B)P(AB)=P(AB)P(B). This is fully illustrated by the following example:

A company has a new website which consists of 4 pages: a Front Page which links to all other pages, an About Us page and a Contact Us page which both link to each other and back to the front page, and an Our Staff page which has no links on it. A prospective customer starts at a random page, and chooses from the available links on the page with equal probability to decide which the next page is. At all future times they repeat this procedure. Thinking of the above process as a Markov chain, and in doing so labeling the pages as 1,2,3,41,2,3,4 according to the order they are listed above, calculate the hitting times f12f12, f13f13, f23f23, f32f32.

The Markov chain can be described diagrammatically:

The transition matrix is therefore P=(01313131201201212000001).P=⎜ ⎜ ⎜ ⎜ ⎜01313131201201212000001⎟ ⎟ ⎟ ⎟ ⎟.

Calculating the hitting probability using the recommended method: f12=P(Xn=2 for some n1X0=1)=4k=1P(Xn=2 for some n1, and X1=kX0=1)=4k=1P(Xn=2 for some n1X1=k,X0=1)P(X1=kX0=1)=4k=1P(Xn=2 for some n1X1=k)p1k=P(Xn=2 for some n1X1=1)p11+P(Xn=2 for some n1X1=2)p12+P(Xn=2 for some n1X1=3)p13+P(Xn=2 for some n1X1=4)p14=f12p11+1p12+f32p13+f42p14=f120+113+f3213+013=13+13f32.()f12=P(Xn=2 for some n1X0=1)=4k=1P(Xn=2 for some n1, and X1=kX0=1)=4k=1P(Xn=2 for some n1X1=k,X0=1)P(X1=kX0=1)=4k=1P(Xn=2 for some n1X1=k)p1k=P(Xn=2 for some n1X1=1)p11+P(Xn=2 for some n1X1=2)p12+P(Xn=2 for some n1X1=3)p13+P(Xn=2 for some n1X1=4)p14=f12p11+1p12+f32p13+f42p14=f120+113+f3213+013=13+13f32.()

So we have an expression for f12f12 in terms of f32f32. Let’s calculate f32f32: f32=P(Xn=2 for some n1X0=3)=4k=1P(Xn=2 for some n1, and X1=kX0=3)=4k=1P(Xn=2 for some n1X1=k,X0=3)P(X1=kX0=3)=4k=1P(Xn=2 for some n1X1=k)p3k=P(Xn=2 for some n1X1=1)p31+P(Xn=2 for some n1X1=2)p32+P(Xn=2 for some n1X1=3)p33+P(Xn=2 for some n1X1=4)p34=f12p31+1p32+f32p33+f42p34=f1212+112+f320+00=12+12f12.()f32=P(Xn=2 for some n1X0=3)=4k=1P(Xn=2 for some n1, and X1=kX0=3)=4k=1P(Xn=2 for some n1X1=k,X0=3)P(X1=kX0=3)=4k=1P(Xn=2 for some n1X1=k)p3k=P(Xn=2 for some n1X1=1)p31+P(Xn=2 for some n1X1=2)p32+P(Xn=2 for some n1X1=3)p33+P(Xn=2 for some n1X1=4)p34=f12p31+1p32+f32p33+f42p34=f1212+112+f320+00=12+12f12.()

Substituting equation ()() into equation ()(): f12=13+13f32=13+13(12+12f12)=13+16+16f12f12=13+13f32=13+13(12+12f12)=13+16+16f12 56f12=1256f12=12 f12=35.f12=35. Therefore f32=12+12f12=12+1235=45.f32=12+12f12=12+1235=45. By the symmetry of the structure of the website with regards to the pages About Us and Contact Us pages, we have f13=35andf23=45.f13=35andf23=45.

As a side note to Example 8.4.2, note that once one gets on the Our staff page that there is no hyperlink away from this page. Therefore f4,1=f4,2=f4,3=0f4,1=f4,2=f4,3=0.

Calculate f14f14, f24f24 and f34f34 in Example 8.4.2.

Hitting time probabilities can be decomposed into a sum of probabilities such a transition takes place in exactly nn steps.

Given two distinct states i,ji,j of a Markov chain, define f(n)ijf(n)ij to be the probability of reaching state jj for the first time in exactly nn steps given that the current state is ii. Mathematically f(n)ij=P(X1j,X2j,,Xn1j,Xn=jX0=i).f(n)ij=P(X1j,X2j,,Xn1j,Xn=jX0=i).

What is the difference between f(n)ijf(n)ij and p(n)ijp(n)ij?

Defintion 8.4.1 and Defintion 8.4.3 collectively lead to the identity fij=n1f(n)ij.fij=n1f(n)ij.

Consider again the company website from Example 8.4.2. Starting on the Front Page, I begin to browse the website by clicking on hyperlinks. What is the probability that I return to the Front Page for the first time after nn movements, where n1n1.



Mathematically the question is asking us to calculate f(n)11f(n)11. To do so, we will look at all paths starting at state 11 which first return to state 11 after nn time steps. Since there is no link to leave the Our Staff page, that is there is no way for the Markov chain to leave state 44, such a path will not visit this page of the website. Furthermore the path only visits state 11, the Front Page, at the beginning and end, all other intermediary states will necessarily fluctuate between states 22 and 33.

Calculate f(1)11=P(X1=1X0=1)=0,f(2)11=P(X11,X2=1X0=1)=P(X1=2,X2=1X0=1)+P(X1=3,X2=1X0=1)=P(X1=2X0=1)P(X2=1X1=2)+P(X1=3X0=1)P(X2=1X1=3)=1312+1312=13f(3)11=P(X1,X21,X3=1X0=1)=P(X1=2,X2=3,X3=1X0=1)+P(X1=3,X2=2,X3=1X0=1)=P(X1=2X0=1)P(X2=3X1=2)P(X3=1X2=3)+P(X1=3X0=1)P(X2=2X1=3)P(X3=1X2=2)=131212+131212=16f(n)11=P(X1,X2,,Xn11,Xn=1X0=1)=P(X1=2,X2=3,,Xn=1X0=1)+P(X1=3,X2=2,,Xn=1X0=1)=1312n1+1312n1=1312n2f(1)11=P(X1=1X0=1)=0,f(2)11=P(X11,X2=1X0=1)=P(X1=2,X2=1X0=1)+P(X1=3,X2=1X0=1)=P(X1=2X0=1)P(X2=1X1=2)+P(X1=3X0=1)P(X2=1X1=3)=1312+1312=13f(3)11=P(X1,X21,X3=1X0=1)=P(X1=2,X2=3,X3=1X0=1)+P(X1=3,X2=2,X3=1X0=1)=P(X1=2X0=1)P(X2=3X1=2)P(X3=1X2=3)+P(X1=3X0=1)P(X2=2X1=3)P(X3=1X2=2)=131212+131212=16f(n)11=P(X1,X2,,Xn11,Xn=1X0=1)=P(X1=2,X2=3,,Xn=1X0=1)+P(X1=3,X2=2,,Xn=1X0=1)=1312n1+1312n1=1312n2

The concept of nn-step hitting times f(n)ijf(n)ij leads to consideration of the random variable TjiTji, defined formally below, which governs when the process first hits the state jj starting from state ii.

The hitting time, denoted TjiTji, is the random variable which returns the quantity Tji=inf{n>0:Xn=j given X0=i}Tji=inf{n>0:Xn=j given X0=i} for a randomly chosen sample path of the Markov chain.

If we never reach state jj from state ii in the sample path then Tji=Tji=.

Note that f(n)ij=P(Tji=n).f(n)ij=P(Tji=n).

Consider again the Markov chain of Example 8.4.2 based on traversing a companys’ website. Calculate P(T21=1),P(T21=2),P(T21=3)P(T21=1),P(T21=2),P(T21=3).

We calculate P(T21=1)=f(1)12=p12=13,P(T21=2)=f(2)12=P(X1=1,X2=2X0=1)+P(X1=3,X2=2X0=1)+P(X1=4,X2=2X0=1)=p11p12+p13p32+p14p42,=013+1312+130=16,P(T21=3)=f(3)12=p11f(2)12+p13f(2)32+p14f(2)42,=0f(2)12+13f(2)32+130,=13f(2)32,=13(p31p12)=131213=118P(T21=1)=f(1)12=p12=13,P(T21=2)=f(2)12=P(X1=1,X2=2X0=1)+P(X1=3,X2=2X0=1)+P(X1=4,X2=2X0=1)=p11p12+p13p32+p14p42,=013+1312+130=16,P(T21=3)=f(3)12=p11f(2)12+p13f(2)32+p14f(2)42,=0f(2)12+13f(2)32+130,=13f(2)32,=13(p31p12)=131213=118

In the calculation of P(T21=2)P(T21=2) why did we ignore the path where X0=1,X1=2,X2=2X0=1,X1=2,X2=2.

Renewal Equation: Given a Markov chain with nn-step transition probabilities p(n)ijp(n)ij, and hitting probabilities fijfij, one has p(n)ij=nk=1f(k)ijp(nk)jj.p(n)ij=nk=1f(k)ijp(nk)jj.

Informally the renewal equation states that the probability of going from state ii to state jj over nn steps can be conditioned on when we first visit state jj.