Chapter 8 Stochastic Process and Markov Chains

8.1 Strochastic Processes

A stochastic process is a family of random variables {Xt:tT} where the index t belongs to some parameter set T, and each of the random variables can take the same set of possible values. The possible values that the random variables can take is known as the state space.

Typically the indices t and parameter set T have a time-related interpretation. With this in mind, informally a stochastic process can be thought of as the state of some system at a time t. When the set T is discrete, for example T={0,1,2,3,4,5,}, then the stochastic process {Xt:tT} is known as a discrete-time process. When the set T is continuous, for example T=[0,), then the stochastic process {Xt:tT} is known as a continuous-time process. In DATA2002, we will predominantly consider stochastic processes that are discrete time processes.

Furthermore throughout the remainder of DATA2002, we will only consider stochastic processes for which the state space is discrete, that is, each of the Xt are discrete random variables.

Consider a fair 6-sided die that we roll repeatedly keeping track of our total cumulative score. Let Xt be the total of the first t dice rolls. What is the parameter set T, and what is the state space denoted S. Also calculate the PMFs of X1 and X2.



The index t corresponds the number of dice rolls that have been made, and so will be a positive whole integer. Therefore T={1,2,3,4,5,}. After n dice rolls, the cumulative total Xn will be an integer between n and 6n. All positive integers will be included in at least one these intervals, and since all random variables must share a common state space, it is necessary to take S={1,2,3,4,5,}. The cumulative total score after one die roll, that is the PMF ρX1 of X1 is given by ρX1(x)={16,if x=1,2,3,4,5,6,0,otherwise.

The possible outcomes for rolling the die twice are summarized by the following table.

Therefore the cumulative total score after two dice rolls, that is the PMF ρX2 of X2 is given by ρX2(x)={136,if x=2,12,118,if x=3,11,112,if x=4,10,19,if x=5,9,536,if x=6,8,16,if x=7,0,otherwise.

Note from Example 8.1.2 that the random variable Xt having an element s in the state space S is not the same as there being a non-zero probability that Xt=s. For example P(X1=7)=0 since it is impossible to have a total of 7 after just one roll, but 7 does remain in the state space S of X1.

Deja Brew and Latte Da are the two chains of cafes in Nottingham. Mary Berry travels to Nottingham and decides to try out Latte Da. Mary then falls into a habit: if she has been to Latte Da there is a 56 chance she chooses to go to Deja Brew on her next trip to Nottingham, and a 16 chance she will go back to Latte Da. Alternatively if Mary has been to Deja Brew, then there is a 13 chance that she decides to go Latte Da on her next day out in Nottingham, and a 23 she returns to Deja Brew.

This can be summarised by the following diagram:

Mary Berrys’ behaviour can be modeled by a stochastic process. Let S={Latte Da, Deja Brew} be the state space, and Xt indicate whether Mary goes to Latte Da or Deja Brew on her tth subsequent trip to Nottingham, with tT=Z>0.

A sample path is a sequence of realised values for the random variables Xt of a stochastic process.

Consider again the random variables Xt of Example 8.1.2, obtained by calculating the cumulative score of n rolls of a fair 6-sided die. The following table outlines examples of sample paths ω1,ω2,ω3,ω4,ω5 observed for the stochastic process:

For ω1, we rolled 4,3,3,6,1,1, successively.

Using the stochastic process of Example 8.1.2, what is P(X1=1,X4=4)?

Using the stochastic process of Example 8.1.2, what is P(Xt=3 for some tT)?

Using the stochastic process of Example 8.1.2, what is P(Xt10 for all tT)?

These are all examples of questions that we will learn to answer.

What should also be noted in Example 8.1.2 is that in performing the underlying experiment to observe X2, we necessarily observe X1 (before we can roll the die twice, we have to roll it once). Therefore it could be argued that it makes more natural sense to study ρX1 and ρX2X1, rather than ρX1 and ρX2. This argument is true more generally since the parameter set T typically has a time-related or sequential interpretation: we can study ρX1,ρX2X1,ρX3X1,X2,ρXn+1X1,X2,,Xn.

This highlights a key feature of stochastic processes: the transitions from one state to another state.

8.2 Defintion of Markov Chains

A Markov Chain is a type of stochastic processes.

A stochastic process {Xt:t0} is a Markov chain if P(Xt+1=xt+1X0=x0,,Xt=xt)=P(Xt+1=xt+1Xt=xt). for all t0 and states x0,x1,,xn,xn+1.

This is to say, that when trying to predict the future state of a Markov chain it is only important to know the present state and not how the process arrived at this state.

The stochastic processes in Example 8.1.2 and Example 8.1.3 were Markov Chains.

Set X0=X1=1. To define Xt for t2, one tosses a coin. If a head is obtained then set Xt=Xt1+Xt2, and if a tails is obtained set Xt=0. Explain why {Xt:t0} is not a Markov chain.

In the event of tossing a head clearly Xt is dependent on both the value of Xt1 and Xt2, not just the current state Xt1. This is in contradiction to Definition 8.2.1 of a Markov chain.

Identifying Markov chains within your workplace is a key skill for DATA2002. Many Markov chains will have a great number of states.

A Markov chain {Xt:t0} is called time-homogeneous if P(Xt+1=jXt=i)=P(X1=jX0=i) for all t0 and for all i,jS.

Informally time-homogeneouity means that the probabilities regarding transitions between states do not change with time. In this module, we only consider time-homogeneous Markov chains.

Clearly Example 8.1.3 is a time-homogeneous Markov chain.

Consider a time-homogeneous Markov chain {Xt:t0}. The transition probabilities are the values pij=P(X1=jX0=i).

8.3 Transition Matrix

The transition probabilities introduced in Definition 8.2.4 play a key role in the analysis of Markov chains. As an efficient storing technique we introduce the transition probability matrix.

The |S|×|S| matrix P=(pij), where the entry in the ith row and jth column is the transition probability pi,j, is called the transition probability matrix.

Note that any transition probability matrix P=(pij):

  • pij0 for all i,jS, or equivalently every matrix entry is non-negative;

  • jSpij=1 for all iS, or equivalently the entries in each row of the matrix sum to 1.

Any matrix that satisfies both of these conditions is called a stochastic matrix.

What is the transition probability matrix P of the stochastic process in Example 8.1.3?



Allowing the index 1 to correspond to Latte Da and index 2 correspond to Deja Brew, from the description of the problem we have p11=P(Xt+1=Latte Da=1Xt=Latte Da=1)=16,p12=P(Xt+1=Deja Brew=2Xt=Latte Da=1)=56,p21=P(Xt+1=Latte Da=1Xt=Deja Brew=2)=13,p22=P(Xt+1=Deja Brew=2Xt=Deja Brew=2)=23.

Therefore the transition probability matrix is given by P=(16561323).
What is the transition probability matrix P of the stochastic process in Example 8.1.2?

Suppose that Xt=n for some positive integer n. Then since the fair die has an equal chance of rolling each of 1,2,3,4,5,6 there is a 16 chance that Xt+1 is equal to each of n+1, n+2, n+3, n+4, n+5, n+6. Specifically pn,n+1=P(Xt+1=n+1Xt=n)=16,pn,n+2=P(Xt+1=n+2Xt=n)=16,pn,n+3=P(Xt+1=n+3Xt=n)=16,pn,n+4=P(Xt+1=n+4Xt=n)=16,pn,n+5=P(Xt+1=n+5Xt=n)=16,pn,n+6=P(Xt+1=n+6Xt=n)=16,pn,j=P(Xt+1=jXt=n)=0, where jn+1,n+2,n+3,n+4,n+5,n+6. Therefore the transition probability matrix is given by P=(01616161616160000000161616161616000000016161616161600000001616161616160000000161616161616000000016161616161600000001616161616000000001616161600000000016161600000000001616).

This notion of transition probabilities can extended from one-step transitions to n-step transitions. Specifically set pij(n)=P(Xn=jX0=i). Note that due to the Markov chains we are considering being time-homogeneous, we have P(Xn=jX0=i)=P(Xt+n=jXt=i). This is to say, only the number of steps between our known state and the future event we wish to find the probability of matters, not the time at which this is happening.

The |S|×|S| matrix P(n)=(pij(n)), where the entry in the ith row and jth column is the n-step transition probability pij(n), is called the n-step transition probability matrix.

What is the 2-step transition probability matrix P(2) of the stochastic process in Example 8.1.3.



Again allowing the index 1 to correspond to Latte Da and index 2 correspond to Deja Brew, from the description of the problem we can calculate p11(2)=P(Xt+2=1Xt=1)=P(Xt+2=1,Xt+1=1Xt=1)+(Xt+2=1,Xt+1=2Xt=1)=1616+5613=1136,p12(2)=P(Xt+2=2Xt=1)=P(Xt+2=2,Xt+1=1Xt=1)+(Xt+2=2,Xt+1=2Xt=1)=1656+5623=2536,p21(2)=P(Xt+2=1Xt=2)=P(Xt+2=1,Xt+1=1Xt=2)+(Xt+2=1,Xt+1=2Xt=2)=1316+2313=518,p22(2)=P(Xt+2=2Xt=2)=P(Xt+2=2,Xt+1=1Xt=2)+(Xt+2=2,Xt+1=2Xt=2)=1356+2323=1318.

Therefore the 2-step transition probability matrix is given by P(2)=(113625365181318).

The following theorem gives us an easy way to calculate n-step transition probability matrix.

Chapman-Kolmogorov Equations: The transition probability matrix P and the n-step transition probability matrix P(n) satisfy the equation P(n)=Pn.

What is the 2-step transition probability matrix P(2) of the stochastic process in Example 8.1.2.

By the Chapman-Kolmogorov equations, the question is equivalent to calculating P2 where P is the transition probability matrix calculated in Example 8.3.3. Specifically P(2)=P2=(01616161616160000000161616161616000000016161616161600000001616161616160000000161616161616000000016161616161600000001616161616000000001616161600000000016161600000000001616)2=(00136118112195361653619112118136000001361181121953616536191121181360000013611811219536165361911211813600000136118112195361653619112118000000136118112195361653619112000000013611811219536165361900000000136118112195361653600000000013611811219536160000000000136118112195360000000000013611811219)
What is the n-step transition probability matrix P(n) of the stochastic process in Example 8.1.3.



By the Chapman-Kolmogorov equations, the question is equivalent to calculating Pn where P is the transition probability matrix calculated in Example 8.3.2. Specifically P(n)=Pn=(16561323)n.

To calculate this high order of P, we first diagonalise: recall from Mathematics for Data Science that P=MDM1 where D is the diagonal matrix of eigenvalues of P, and M is the matrix whose ith column is the eigenvector of P corresponding to the ith eigenvalue in D. For P, we can check that D=(10016)andM=(1512). It follows that P(n)=Pn=(MDM1)(MDM1)(MDM1)=MDnM1=(1512)(10016)n(27571717)=(1512)(100(16)n)(27571717)=(15(16)n12(16)n)(27571717)=(27+57(16)n5757(16)n2727(16)n57+27(16)n)

Approximate the probability that Mary Berry goes to Deja Brew on some trip infinitely far in the future.

8.4 Hitting Probabilities and Hitting Times

Given a Markov chain and a current state, one can ask about the probabilities of ever achieving other states.

Given two distinct states i,j of a Markov chain, the hitting probability denoted fij is the probability of ever reaching state j in the future given that the current state is i. Mathematically fij=P(Xt=j for some t1X0=i).

Often the best way to calculate fij, is to split the possible sample path into cases based on the first step. Specifically P(Xt=j for some t1X0=i)=kSP(Xt=j for some t1, and X1=kX0=i). This right hand side of this equation can then be conditioned using the general probability identity P(AB)=P(AB)P(B). This is fully illustrated by the following example:

A company has a new website which consists of 4 pages: a Front Page which links to all other pages, an About Us page and a Contact Us page which both link to each other and back to the front page, and an Our Staff page which has no links on it. A prospective customer starts at a random page, and chooses from the available links on the page with equal probability to decide which the next page is. At all future times they repeat this procedure. Thinking of the above process as a Markov chain, and in doing so labeling the pages as 1,2,3,4 according to the order they are listed above, calculate the hitting times f12, f13, f23, f32.

The Markov chain can be described diagrammatically:

The transition matrix is therefore P=(01313131201201212000001).

Calculating the hitting probability using the recommended method: f12=P(Xn=2 for some n1X0=1)=k=14P(Xn=2 for some n1, and X1=kX0=1)=k=14P(Xn=2 for some n1X1=k,X0=1)P(X1=kX0=1)=k=14P(Xn=2 for some n1X1=k)p1k=P(Xn=2 for some n1X1=1)p11+P(Xn=2 for some n1X1=2)p12+P(Xn=2 for some n1X1=3)p13+P(Xn=2 for some n1X1=4)p14=f12p11+1p12+f32p13+f42p14=f120+113+f3213+013=13+13f32.()

So we have an expression for f12 in terms of f32. Let’s calculate f32: f32=P(Xn=2 for some n1X0=3)=k=14P(Xn=2 for some n1, and X1=kX0=3)=k=14P(Xn=2 for some n1X1=k,X0=3)P(X1=kX0=3)=k=14P(Xn=2 for some n1X1=k)p3k=P(Xn=2 for some n1X1=1)p31+P(Xn=2 for some n1X1=2)p32+P(Xn=2 for some n1X1=3)p33+P(Xn=2 for some n1X1=4)p34=f12p31+1p32+f32p33+f42p34=f1212+112+f320+00=12+12f12.()

Substituting equation () into equation (): f12=13+13f32=13+13(12+12f12)=13+16+16f12 56f12=12 f12=35. Therefore f32=12+12f12=12+1235=45. By the symmetry of the structure of the website with regards to the pages About Us and Contact Us pages, we have f13=35andf23=45.

As a side note to Example 8.4.2, note that once one gets on the Our staff page that there is no hyperlink away from this page. Therefore f4,1=f4,2=f4,3=0.

Calculate f14, f24 and f34 in Example 8.4.2.

Hitting time probabilities can be decomposed into a sum of probabilities such a transition takes place in exactly n steps.

Given two distinct states i,j of a Markov chain, define fij(n) to be the probability of reaching state j for the first time in exactly n steps given that the current state is i. Mathematically fij(n)=P(X1j,X2j,,Xn1j,Xn=jX0=i).

What is the difference between fij(n) and pij(n)?

Defintion 8.4.1 and Defintion 8.4.3 collectively lead to the identity fij=n1fij(n).

Consider again the company website from Example 8.4.2. Starting on the Front Page, I begin to browse the website by clicking on hyperlinks. What is the probability that I return to the Front Page for the first time after n movements, where n1.



Mathematically the question is asking us to calculate f11(n). To do so, we will look at all paths starting at state 1 which first return to state 1 after n time steps. Since there is no link to leave the Our Staff page, that is there is no way for the Markov chain to leave state 4, such a path will not visit this page of the website. Furthermore the path only visits state 1, the Front Page, at the beginning and end, all other intermediary states will necessarily fluctuate between states 2 and 3.

Calculate f11(1)=P(X1=1X0=1)=0,f11(2)=P(X11,X2=1X0=1)=P(X1=2,X2=1X0=1)+P(X1=3,X2=1X0=1)=P(X1=2X0=1)P(X2=1X1=2)+P(X1=3X0=1)P(X2=1X1=3)=1312+1312=13f11(3)=P(X1,X21,X3=1X0=1)=P(X1=2,X2=3,X3=1X0=1)+P(X1=3,X2=2,X3=1X0=1)=P(X1=2X0=1)P(X2=3X1=2)P(X3=1X2=3)+P(X1=3X0=1)P(X2=2X1=3)P(X3=1X2=2)=131212+131212=16f11(n)=P(X1,X2,,Xn11,Xn=1X0=1)=P(X1=2,X2=3,,Xn=1X0=1)+P(X1=3,X2=2,,Xn=1X0=1)=1312n1+1312n1=1312n2

The concept of n-step hitting times fij(n) leads to consideration of the random variable Tij, defined formally below, which governs when the process first hits the state j starting from state i.

The hitting time, denoted Tij, is the random variable which returns the quantity Tij=inf{n>0:Xn=j given X0=i} for a randomly chosen sample path of the Markov chain.

If we never reach state j from state i in the sample path then Tij=.

Note that fij(n)=P(Tij=n).

Consider again the Markov chain of Example 8.4.2 based on traversing a companys’ website. Calculate P(T12=1),P(T12=2),P(T12=3).

We calculate P(T12=1)=f12(1)=p12=13,P(T12=2)=f12(2)=P(X1=1,X2=2X0=1)+P(X1=3,X2=2X0=1)+P(X1=4,X2=2X0=1)=p11p12+p13p32+p14p42,=013+1312+130=16,P(T12=3)=f12(3)=p11f12(2)+p13f32(2)+p14f42(2),=0f12(2)+13f32(2)+130,=13f32(2),=13(p31p12)=131213=118

In the calculation of P(T12=2) why did we ignore the path where X0=1,X1=2,X2=2.

Renewal Equation: Given a Markov chain with n-step transition probabilities pij(n), and hitting probabilities fij, one has pij(n)=k=1nfij(k)pjj(nk).

Informally the renewal equation states that the probability of going from state i to state j over n steps can be conditioned on when we first visit state j.