Chapter 3 Stochastic Properties

“Time is making fools of us again.” - Albus Dumbledore






Motivation




After a couple of introductory chapters, we can finally turn to the subject at hand. In this chapter, we will define stochastic processes and discuss some of their properties. We will also review a number of simple stochastic properties and use R to better understand their behavior.




3.1 Stochastic Definitions


It can be hard to define a subject in a single sentence. Think about it: how would you succinctly define ‘mathematics,’ or even ‘statistics?’ We are lucky that a ‘stochastic process’ has a simple, one line definition:


A random variable that changes through time


We’re cheating a bit by leaning on the definition of ‘random variable,’ but still, we have a simple, seven-word definition for an immensely complicated topic!

We denote random variables with capital letters like X. We denote stochastic random variables with Xt, which simply means ‘the value of X at time t.’ We’ll start with potentially the simplest stochastic process: a Bernoulli Process.

XtBern(p)

For t=1,2,... (this is called the index of the process), and all of the Xt random variables are independent. This is a discrete stochastic process because the index is discrete; stochastic processes with continuous indices (continuous intervals of the real line) are considered continuous stochastic processes (this distinction is similar to the one we make with the supports of discrete and continuous random variables!).

Since we know that a Bernoulli is essentially a coin flip that takes on the value 1 with probability p and takes on the value 0 with probability 1p, this stochastic process is just a series of zeroes and ones. For our starting stochastic process, let’s have p=1/2.

# generate a Bernoulli process
data <- data.table(t = 1:100,
                   X = rbinom(100, 1, 1 / 2))

# visualize
ggplot(data, aes(x = t, y = X)) +
  geom_line() +
  theme_bw() +
  ggtitle("A Bernoulli Process") +
  ylim(c(-2, 2))


It’s important to remember that stochastic processes are still random variables. That is, Xt (and X4, X17 and X368, and all of the other steps in the process) has an expectation, a variance, moments, density functions, etc. In this specific case, for example, we have:

E(Xt)=12

Var(Xt)=14

P(Xt=1)=P(Xt=0)=12

For all values of t in the index {1,2,...}. Of course, this type of generality isn’t strictly necessary: we will be working with processes where we don’t have these neat solutions for all values of t in the index (i.e., where the distribution of the process changes with t, in which case we might have E(X1)E(X2), or something along these lines).




3.2 Properties


The Bernoulli Process is exceedingly simple, so we will use it to explore some of the different properties of stochastic processes.



3.2.1 Stationarity


Colloquially, a stochastic process is strongly stationary if its random properties don’t change over time. A more rigorous definition is that the joint distribution of random variables at different points is invariant to time; this is a little wordy, but we can express it like this:

Ft1,t2,...,tn=Ft1+k,t2+k,...,tn+k

For all k, n and sets of time intervals t1,t2,... in the stochastic process.

We introduced some new notation here; let’s make sure that we are familiar with all of the notation in the above equation. First, Ft1 is the CDF (cumulative distribution function) of Xt1, or the stochastic process at time t1 (remember, we simply have that Ft=P(Xt<x); this is the definition of a CDF). Then, Ft1,t2 is the joint CDF of Xt1 and Xt2, or P(Xt1<x1,Xt2<x2). What this is equation is saying, then, is that a stochastic process is strongly stationary if the joint distribution of Xt at times t1,t2,...,tn is equal to the joint distribution of Xt at times t1+k,t2+k,...,tn+k (i.e., the time indices t1,t2 etc. shifted by k. For example, we might be asking ‘is the joint distribution of the stochastic process at time points 1 and 2 the same as the joint distribution of the stochastic process at time points 3 and 4?’).


Our Bernoulli Process is an i.i.d. process; that is, the different values of Xt (at different time points t) are all independent and identically distributed Bernoulli random variables. Because the Bernoullis are independent, their joint CDF is just the product of the marginal CDFs:

Ft1,t2,...,tn=Ft1Ft2...Ftn

And, since the stochastic process has identical distributions Bern(p) at every time point, we have:

Ft1Ft2...Ftn=Ft1+kFt2+k...Ftn+k

So, the Bernoulli is strongly stationary! In fact, any i.i.d. process (because, by definition, the random variable has identical, independent distributions at different time point) is strongly stationary!


Weak stationarity, naturally, has less restrictive conditions than strong stationary. A stochastic process Xt is weakly stationary if it meets these three conditions:

  1. The mean of the process is constant. That is, E(Xt)=μ (where μ is some constant) for all values of t.
  2. The second moment of Xt, or E(X2t), is finite.
  3. We have:

Cov(Xt1,Xt2)=Cov(Xt1k,Xt2k)

For all sets of t1,t2 and all values of k. That is, the covariance between two points only changes as the distance between them changes (i.e., k), not as we move along in time. That is, Cov(X5,X8) must equal Cov(X11,X14) because X5 is as far from X8 as X11 is from X14 (three time points away).


We can quickly check if our Bernoulli Process is weakly stationary (we will again use p as the parameter of the Bernoulli to keep the notation general):

  1. E(Xt)=p, because each Xt is i.i.d. Bern(p). So, yes, the mean is constant through time!
  2. Since X2t just takes on values 1 and 0, and these values don’t change when they are squared (1 goes to 1 and 0 goes to 0), we have X2t=Xt and thus E(X2t)=E(Xt)=p. Therefore, the second moment of Xt is indeed finite (the maximum value of p is 1).
  3. Finally, since the Xt values are independent, we have that Cov(Xj,Xk)=0 for all jk.

So, a Bernoulli Process is weakly stationary!


That might have seemed like a lot of extra work. Didn’t we show that the Bernoulli Process was strongly stationary? Well, unfortunately, strong stationarity does not necessarily imply weak stationarity, and weak stationarity does not necessarily imply strong stationarity. Let’s see if we can construct counterexamples in both cases: something that is strongly stationary but not weakly stationary, and something that is weakly stationary but not strongly stationary.


Let’s define a new stochastic process Yt that takes on Xt when t is even and simply takes on the constant 1/2 when t is odd. We can simulate this easily in R (using the ‘modulo’ function which gives us the remainder of one number divided into another number):

# replicate
set.seed(0)

# generate a Bernoulli process
data <- data.table(t = 1:100,
                   X = rbinom(100, 1, 1 / 2))

# generate Y
data$Y <- data$X
data$Y[data$t %% 2 == 1] <- 1 / 2

# visualize
ggplot(data, aes(x = t, y = Y)) +
  geom_line() +
  theme_bw() +
  ggtitle("Bernoulli Process at even time points, 1/2 at odd time points") +
  ylim(c(-2, 2))


Let’s check to see if this process is weakly stationary by running through the three conditions.

  1. E(Yt)=1/2, since, when t is even, we have a Bern(1/2), and when t is odd, we simply have the value 1/2.
  2. When t is even, we have a Bern(1/2) random variable, and we have seen above that the second moment is finite (we just have that p=1/2). When t is odd, we have the constant 1/2, and the second moment of a constant is just the constant squared (1/4). So, the second moment is finite.
  3. Finally, since the Yt values are independent, we have that Cov(Yj,Yk)=0 for all jk.

So, Yt is indeed weakly stationary!


Now we can test for strong stationarity. We can check the simplest case:

F(Y1)?=F(Y2)

We know that Y1 is just a constant and that Y2Bern(1/2). These clearly have different CDFs (one quick intuitive check is that Y2 has a non-zero probability of being above 1/2 whereas Y1 can only be 1/2), so we can say:

F(Y1)F(Y2)

Therefore, Yt is not strongly stationary; the distribution changes with time (half the time the stochastic process is a Bernoulli random variable and half the time it is a constant!).


How about the other direction: a process that is strongly stationary but not weakly stationary? The simplest example is a process Xt where each value is an i.i.d. Cauchy distribution. We won’t review the Cauchy distribution here, but the key is its second moment is not finite (in general, this is a notoriously finicky distribution: the mean, variance and MGF, along with a couple of other key characteristics, are all undefined). Therefore, Xt is strongly stationary (it is an i.i.d. process), but it is not weakly stationary (the second moment is not finite, so one of the three criteria fails!).

3.2.2 Martingales


Colloquially, a stochastic process is a martingale if ‘we expect the next step to be right where we are.’

Mathematically, we say that a stochastic process Xt is a martingale if:

E(Xn+1|X1,...,Xn)=Xn

For any time n. We also require that E(|Xn|) is finite.


So, at any time n, a stochastic process is a martingale if, conditioning on the entire observed history of the stochastic process up to time n (this is often called the filtration), the expected value of the next step Xn+1 is just Xn (which has already been observed, since we are at time n).


Is our Bernoulli Process a martingale? Well, since each step is independent, we have:

E(Xn+1|X1,...,Xn)=E(Xn+1)

And since Xn+1Bern(1/2) like all of the other steps, we have:

E(Xn+1)=1/2

However, we know that Xn will either be 0 or 1, so we can say:

E(Xn+1|X1,...,Xn)Xn

And therefore our Bernoulli Process is not a martingale.


Can we construct a simple stochastic process that is a martingale? Let’s try defining Yt, where:

Y0=0 Yt=Yt1+ϵt

Where t=1,2,... and ϵt is a series of i.i.d. N(0,1) random variables (standard normal random variables).

This is called a Gaussian Random Walk. The Gaussian part means ‘Normal’ and the Random Walk part means that, well, the process looks like it’s randomly walking around! We can easily simulate this process in R:

# replicate
set.seed(0)

# generate a Gaussian Random Walk
data <- data.table(t = 1:100,
                   Y = cumsum(rnorm(100)))

# visualize
ggplot(data, aes(x = t, y = Y)) +
  geom_line() +
  theme_bw() +
  geom_hline(yintercept = 0) + 
  ggtitle("Martingale?") 


Let’s try to calculate the expectation of Yn+1 given the history of Y up to time n. We have:

E(Yn+1|Y1,...,Yn)=E(Yn+1|Yn)

Because Yn+1 is conditionally independent on Y1,Y2,...,Yn1 given Yn (i.e., if we know the value of Yn, it doesn’t matter how we got there).

We can plug in for Yn+1:

E(Yn+ϵn|Yn)=Yn

Because ϵnN(0,1) and the expected value of Yn given Yn is, well, Yn (the ‘finite expectation’ piece is also satisfied; we know that E(Yn)=0, because Yn is essentially the sum of many standard normal random variables, each of which has expectation 0).


Therefore, Yt is a martingale, because the expected value of the next step is the step that we are currently at (conditioning on the history of Yt up to the current moment). Intuitively, this is because at each step we add something (ϵt) with a mean of zero, so the expected value of the next time point in the series doesn’t move from where we are now (we are adding zero)!




3.2.2.1 The Shakespeare Problem


Martingales may seem pretty boring at first, but they allow us to solve pretty interesting problems. You’ve likely heard this before:


If you let a monkey bang on a keyboard long enough, it will write Shakespeare


We can use martingales to think about just how long this monkey will have to be ‘typing.’


Let’s make a couple of assumptions. The monkey types one letter at a time (come to think of it, is it possible to type more than one letter at a time?), and chooses each letter with equal probabilities (1/26 for each). Punctuation / cases (uppercase vs. lowercase) are irrelevant here.

Let T be the random variable that gives the number of keys that the monkey presses until he types OTHELLO (of King Othello fame). Find E(T).



This can definitely be a tricky problem. We know that E(T) is definitely pretty large (the monkey has to get seven letters in a row correct), but it can be challenging knowing where to start with this problem: there are some nasty hidden traps! For example, consider how, if the monkey successfully types OTHE, he can either type the next correct letter L, or he can type an incorrect letter. However, not all incorrect letters are created equal. If he types a non-O, he has to start over; however, if he types O, he lost his OTHE progress but at least still has made on letter of progress (the O). This changes after the monkey has typed OTHELL; after this, all incorrect letters are created equal. Either the monkey types O and ends the game, or types a non-O and starts from scratch, which is different from what we saw above. These minor differences can quickly balloon into a nightmare if we try to solve the problem with a variety of conventional methods.


Instead, then, we’re going to use two tools to solve this problem. The first will be setting up a series of gamblers (yes, gamblers) that bet on the success of the monkey. Before the monkey hits each new letter, a new gambler enters the game and ponies up $1. Let’s think about the first gambler (who bets on the monkey’s first letter): he puts up $1 and gets $26 back if the monkey hits O (the first letter in OTHELLO) including his original $1 (for a $25 profit). If the monkey hits any other letter than O, the gambler loses his dollar. In this way, it’s a fair bet: the gambler has a 1/26 probability of getting 26 times his investment, and a 25/26 probability of losing everything (the gambler profits $25 with probability 1/26 and loses his $1 with probability 25/26, so his expected profit is 25(1/26)1(25/26)=0).

A new gambler comes in before the monkey hits each letter. For example, if the monkey hits K on the first letter, the first gambler loses his dollar and goes home, and a new gambler arrives and puts a dollar on the monkey hitting O on the second letter.

It gets interesting if the monkey hits O. The active gambler gets $26, and now bets all $26 on the monkey hitting T on the next letter with the same odds: if the monkey hits T, the gambler now has 262 dollars, and if the monkey doesn’t hit T, the gambler loses everything. Again, this is a fair bet; the gambler has a 1/26 probability of profiting 26226 (the 262 dollars he gets back minus his original 26 dollars) and a 25/26 probability of losing his 26 dollars. Therefore, his expected profit is still 0:

(1/26)(26226)(25/26)26=0

In addition to the gambler who won on the O and now puts all of his 26 dollars on T coming next, a new gambler enters the game and puts one dollar on O being struck next (i.e., the monkey starts typing OTHELLO with the next letter). If O is hit, the first gambler loses his 26 dollars and the second gambler now has 26 dollars and proceeds in the same way.


This sounds like a convoluted setup, but, amazingly, it will make our problem easier. Let us now define Mn as the total winnings of all gamblers up to time n. We know that Mn is a martingale; if, for instance, the monkey has hit ten non-O’s in a row, then we have M10=10 (10 gamblers have each lost a single dollar). Conditioning on M10=10 (really, conditioning on the history of the process up to time 10, although M10 is the only point we need here), our expected value for M11 is, well, just 10. That’s because the new gambler who enters the game at the 11th letter is making a fair bet; his expected profits are zero, so our expectation for the new total of all of the gamblers winnings is just the last value in the series (the 10 plus the expected zero dollars from this new gambler).


We can now unveil our second tool, which is Doob’s Optional Stopping Theorem. This states that, for a martingale Xt with a ‘stopping time’ τ (the stopping time is just when we ‘stop’ the process) under certain general conditions (Xt is bounded, or T is bounded, or E(T)<), we have that:

E(X0)=E(Xτ)

That is, the expected value of the process at the stopping time is the same as the expected value of the process at the beginning!


We know that Mt is a martingale, and T is a stopping time (we stop once we hit OTHELLO), so we can apply Doob’s to this problem (we won’t do a formal proof for conditions, but consider why E(T) might be finite). We know, then, that:

E(M0)=E(MT)

We know that E(M0) is just, well, zero, because this is the number of winnings before any of the gamblers have actually gambled! We thus have:

E(M0)=0=E(MT)

Let’s now think about E(MT). We know that, at time T, just after OTHELLO has been hit, T gamblers have put up one dollar, so there is a T term in this expectation. We also know that one (very lucky) gambler has won 267 dollars (he won each bet that he put on) and (this is the tricky part), one gambler has won 26 dollars (the gambler that bet on a O when the monkey hit the last O in Othello!). Therefore, we simply have:

0=E(267+26T) 0=267+26E(T) E(T)=267+26


Incredible! We were able to solve this very tricky problem by setting up an (admittedly convoluted) gambling scheme and using a cool property of martingales; what’s more, the solution is broken into easily digestible parts!




3.2.3 Joint Stochastic Processes


As in most statistical fields, stochastic processes can be ‘levered up’ from the marginal (single process) to the joint (multiple processes). We will see a couple of examples of joint stochastic processes later on, so we can introduce some examples here and explore some of their key properties.


Let’s return to our Gaussian Random Walk:

Y0=0 Yt=Yt1+ϵt

Where t=1,2,... and ϵt is a series of i.i.d. N(0,1) random variables.

Let’s now define a process Zt:

Z0=0 Zt=Ytγt

Where t=1,2,... and γt is a series of i.i.d. Expo(1) random variables.

We can easily visualize a simulation of these two processes:

# generate the processes
data <- data.table(t = 1:100,
                   Y = cumsum(rnorm(100)))
data$Z <- data$Y - rexp(100, rate = 1)

# visualize
ggplot(data, aes(x = t, y = Y)) +
  geom_line(col = "black") +
  geom_line(data = data, aes(x = t, y = Z), col = "red") +
  theme_bw() +
  geom_hline(yintercept = 0) + 
  ggtitle("Joint Processes Y (black) and Z (red)") +
  ylab("")


Now is a good time to introduce the concept of independence as it relates to stochastic processes. We say that two stochastic processes Xt and Yt are independent if, for any sequence (tk,tk+1,...,tk+n) where k,n are positive integers less than , we have that the vector:

(Xtk,Xtk+1,...,Xtk+n)

…is independent of the vector…

(Ytk,Ytk+1,...,Ytk+n)

Colloquially, for Xt and Yt to be independent, we must have that any subset of Xt is independent to the same subset (on the same time points) of Yt.

What does it mean for two vectors to be independent? To really get into the weeds here we would need some measure theory, which is beyond the scope of this book. We can think of this in our parlance; the random vector for Xt above has joint density:

fxt(xtk,xtk+1,...,xtk+n)

Where fxt is just defined as the joint density of Xt (note that if each Xt was i.i.d., we would have that fxt is just the product of the marginal densities of Xt at all of the individual time points). Similarly, we have the joint density of Yt:

fyt(ytk,ytk+1,...,ytk+n)

And, similar to independence with marginal random variables, we can say that the vectors are independence if the joint density of both vectors fxt,yt is equal to the product of fxt and fyt:

fxt,yt(xtk,xtk+1,...,xtk+n,ytk,ytk+1,...,ytk+n)=fxt(xtk,xtk+1,...,xtk+n)fyt(ytk,ytk+1,...,ytk+n)


That’s certainly a mouthful! Let’s return to our example of Yt, the Gaussian Random Walk, and Zt above. We can test a very simple case; is the joint density of Y1 and Z1 equal to the product of the marginal densities of Y1 and Z1?

f(y1,z1)?=f(y1)f(z1)

Immediately, we can see this is not the case; Z1 can only be, at maximum, Y1. The density of Z1 is only defined up to Y1, and thus we can’t multiply the marginal densities to get the joint densities. Z1 and Y1 are not independent, which means, by extension, Zt and Yt are not independent.



This example may seem excessively simple, but the concept of independence is important in stochastic processes.




3.2.4 Markov Property


Colloquially, a stochastic process with the Markov property only cares about its most recent state.

Mathematically, the stochastic process Xt satisfies the Markov property if:

P(Xt|X1,X2,...,Xt1)=P(Xt|Xt1)

That is, within the filtration of Xt (all of the information before time t), Xt1 tells us all the information that we need to know about Xt (put another way, Xt is conditionally independent of X1,X2,...,Xt2 given Xt1).

Our Bernoulli Process is clearly Markov, since each step is just an i.i.d. Bernoulli, and thus:

P(Xt|X1,X2,...,Xt1)=P(Xt|Xt1)=P(Xt)

That is, conditioning on the past doesn’t even tell us anything, because each step is independent of previous steps.

We can also consider our Gaussian Random Walk:

Y0=0 Yt=Yt1+ϵt

Where ϵt is a series of i.i.d. N(0,1).

This is also Markov; given Yt1, we have all of the information about Yt that we could get from the history the process; the only random part remaining is ϵt, which is independent of Yt.




3.2.5 Reflective Property


We can now turn to a fun property of certain stochastic processes. The reflective property invokes symmetry and yields a pretty neat result; we will be dealing with the reflective process for continuous stochastic processes.


Let Xt be a continuous stochastic process such that, for all t and k<t, we have that XtXtk is a symmetric random variable about 0 (a random variable Y is symmetric about 0 if Y has the same distribution as Y). That is, the increment of our process (amount that we add to our process over every interval) is symmetric.


To explore this property, we will introduce a new stochastic process: Brownian Motion. This is a stochastic process with a relatively simple structure but many applications. Let Xt be Brownian Motion; we have that X0=0 and:

XtN(0,t)

Remember that this is a continuous process, so the index is simply t0 (that is, the positive part of the real line). Brownian motion has some key characteristics; first, increments of Brownian motion are themselves Normal random variables. For example, where s<t, we have:

XtXsN(0,ts)

This is intuitive; there is ts time from time s to time t, so we have a Brownian motion with variance ts (this also explains why XtN(0,t), because Xt is essentially XtX0, and X0 is the constant 0). Now imagine another increment XwXv that is non-overlapping (that is, either w and v are both greater than t or both w and v are both less than s, the point being that w and v aren’t in the interval between s and t). We have that XwXv is independent of XtXs. Naturally, if the intervals were overlapping, the two would be dependent.


We can clearly see that this is symmetric, by the definition of a Brownian Motion; as we stated above, the increments have a Normal distribution with mean 0, which is symmetric about 0.


Let’s generate a sample path of Brownian Motion to illustrate what the reflective property is.

Note that the way we generate Brownian Motion is similar to how we generate the Gaussian Random Walk; the difference is that, theoretically, the Brownian Motion is defined at every point on t0, and the Gaussian Random Walk is only defined on zero and positive integers. In our simulation, of course, we can’t quite get to the continuous t0, so we simply try to include more points in the same discrete interval. The result, in essence, is that we get closer to the continuous case.

We’ll be using, then, increments of .01 (we used increments of 1 in the Gaussian Random Walk case). It’s important to get the standard deviation in our rnorm call correct: remember that the variance of each interval of a Brownian Motion is simply the distance in time stamps. Therefore, we want a variance of .01 and thus use sqrt(.01) for the standard deviation argument in rnorm (remember that the standard deviation is the square root of variance!).

# replicate
set.seed(0)

# generate a Gaussian Random Walk
data <- data.table(t = c(0, seq(from = .01, to = 10, by = .01)),
                   X = c(0, cumsum(rnorm(1000, mean = 0, sd = sqrt(.01)))))


# visualize
ggplot(data, aes(x = t, y = X)) +
  geom_line() +
  geom_hline(yintercept = 0) + 
  theme_bw() + ylab("") +
  ggtitle("Brownian Motion") 



Here’s a video review of this stochastic process:

Click here to watch this video in your browser


Let us now see how we can use the reflective property to solve interesting problems relating to Brownian Motion.

# replicate
set.seed(0)

# generate a Gaussian Random Walk
data <- data.table(t = c(0, seq(from = .01, to = 10, by = .01)),
                   X = c(0, cumsum(rnorm(1000, mean = 0, sd = sqrt(.01)))))


# reflect
data$Y <- c(data$X[1], diff(data$X))
data$Y[(which(data$X > 1)[1] + 1):nrow(data)] <- -data$Y[(which(data$X > 1)[1] + 1):nrow(data)]
data$Y <- cumsum(data$Y)

# visualize
ggplot(data, aes(x = t, y = X)) +
  geom_line(col = "black") +
  geom_line(data = data, aes(x = t, y = Y), col = "red") +
  geom_hline(yintercept = data$X[which(data$X > 1)[1]], linetype = "dotted") +
  theme_bw() +
  geom_hline(yintercept = 0) + 
  ylab("") +
  ggtitle("Reflective Property") 


In this chart, we take a Brownian Motion (black line) and, the first time that this random walk breaches 1 (in this specific example, it hits 1.13 at t=4.28; remember that we can’t fully simulate the continuous case so we are just using an interval with smaller segments of length .01), we reflect the process about itself (about the line 1.13, here; in the code above, you can see how we flip the increments of Y after X breaches 1). The red line is, intuitively, this reflection (we could also simply reflect the process from the start about 0, instead of waiting for the process to hit some value).

Now, the reflection principle states that the black line has the same probability of being realized as the red line. That is, at time t=0, these two paths have the same probability of occurring. For a proof here, we can rely on symmetry: since the increment at every time step is symmetric about zero and we are simply flipping all of the increments after a certain point to get from the black line to the red line, the two paths should have the same probability (remember, for a symmetric random variable Z, we have that Z has the same distribution as Z).



This may sound like a convoluted and useless property, but we can use it to solve some pretty interesting problems. For example, consider this question for Xt, our Brownian Motion above:

What is the probability that, by time t=100, we have max(X1,X2,...,X100)>5 but X100<0? More colloquially, what is the probability that Xt hits a max above 5 but still ends below zero by time t=100?


This seems like a challenging problem. There are a lot of ways that this condition could be achieved; theoretically, Xt could breach 5 at any time (before t=100) and fall from there (although, of course, this becomes less likely if there is less time to fall!). One first reaction might be to try to integrate across the whole timespan; we could potentially find the probability that Xt breaches 5 by any point, and then the probability that, from that point, Xt falls back below 0 by the time t=100.


However, we can use the reflective property to make this problem far easier. Let us imagine a process that hits the 5, and then let us reflect that process about 5 for the rest of the period. As we saw above, these two paths have the same probability of occurring.

Now, we know that if the reflected path climbs above 10, then the original path has climbed below 0 (remember, we began to reflect the path as soon as the original path hit 5). Therefore, we simply need the probability that the Brownian motion ends above 10, or P(X10010). Since we have, by definition, that X100N(0,100), this becomes a simple calculation:

1Φ(100100)=1Φ(110)

Where Φ is the CDF of a standard Normal random variable. In R, we get:

1 - pnorm(10 / 10)
## [1] 0.1586553


We can simulate many paths of this Brownian motion to see if our answer makes sense (we will use intervals of .1 in our simulation):

# replicate
set.seed(0)
nsims <- 1000


# generate the Brownian motions
data <- replicate(cumsum(rnorm(1000, mean = 0, sd = sqrt(.1))), n = nsims)

# count how many paths hit 5 and ended below 0
results <- apply(data, 2, function(x) {
              if (max(x) >= 5 & tail(x, 1) < 0) {
                return(1)
              }
              
              return(0)
            })

mean(results)
## [1] 0.145


It looks like our empirical result is close to the analytical result we solved! You can try running more simulations (increasing nsims) to ensure that the probabilities match.


What started as a complicated problem, then, reduced drastically in complexity when we applied the reflective property; in general, the beauty of symmetry can reduce many difficult problems to smaller, much easier problems!

If you are still confused, this video may help to clear things up:

Click here to watch this video in your browser