3.3 Design and Analysis of Monte Carlo Methods

According to Murdoch [2000], the term Monte Carlo originally referred to simulations that involved random walks and was first used by Jon von Neumann and S. M. Ulam in the 1940’s.

Monte Carlo methods involve simulating a process multiple times and using statistical inference to approximate unknown quantities.

The fundamental idea is to use randomness to solve deterministic problems.

Identify an appropriate model (or models) that represents (or represent) the population of interest.
Generate random observations using the model (or models).
Compute outcome based on the sampled inputs. For example, calculate the value for the statistic of interest using the random observations.
Repeat Steps 2 and 3 for $M$ trials.
Aggregate the outcomes of the $M$ trials. For example, use the $M$ values to study the distribution of the statistic.

3.3.1 Basic Examples

R provides a rich set of functions for generating random numbers and running Monte Carlo simulations. Below are some fundamental examples.

Motivation: Estimating $\pi$ using random number generation

Consider a unit circle centered at the origin with radius $r=1$ . The equation for the points that lie inside the unit circle is:

$x^2 + y^2 \leq 1$

We know that the area of a unit circle is $\pi$ .

Now, let’s focus on the first quadrant, $(x,y \geq 0)$ , where the area of the quarter circle is

$A_{\text{quarter circle}} = \frac{\pi}{4}$

The Monte Carlo approach approximates this area by randomly sampling points inside the unit square (which has area = 1), then getting the proportion of points that are inside the quarter circle.

Then the final (estimated) value $\hat{\pi}$ is 4 times the estimated area of the quarter circle.

The following algorithm can be used to approximate $\pi$ .

Generate $M$ random points $(x,y)$ where $x$ and $y$ are sampled uniformly from $[0,1]$ .
Count how many points fall under the quarter unit circle:

$x^2 + y^2 \leq 1$

The estimated value of $pi$ is

$\pi\approx 4 \times \frac{\text{# of points in the quarter circle}}{\text{total points}}$

estimate_pi(1e5)

## [1] 3.14664

In fact, by law of law large numbers, we can observe that as the number of Monte Carlo iterations increase, we can obtain an estimate of $\pi$ that is closer to its actual value.

Today, the Monte Carlo method refers to any simulation that involves the use of random numbers.

Monte Carlo Experiment: Mean of Poisson

Inferential statistics draws conclusions about a population from a random sample and assesses their reliability.

To assess reliability, the statistician must understand the distribution of any statistics that are used in the analysis.

The sampling distribution of a statistic can sometimes be derived with analytical solutions. But sometimes, it is just difficult to do so.

Definition 3.10 Monte Carlo experiments are a class of computational algorithms that rely on repeated random sampling to estimate numerical results. They are widely used in various fields, including statistics, physics, finance, and machine learning, to approximate complex mathematical problems that may not have closed-form solutions.

Suppose our population is best modelled by a Poisson distribution (commonly used for count data). The Poisson distribution is completely described by a single parameter: rate, $\lambda = \mu$ .

The question here is

“Is the sample mean still unbiased and does it have lower variance than other estimators, say sample median?”

To answer this question, we have two approaches:

prove analytically that the sample mean mean is unbiased and have lower variance than sample median
show numerically that these statements are true

Monte Carlo experiments allow us to answer these questions using the second approach.

In this example, we will compare the bias and variance of the two estimators, sample mean and sample median in estimating the population mean for populations distributed as Poisson.

Suppose our population of interest is an infinite population that is Poisson distributed with rate parameter $\lambda$ .
Obtain a sample of size $n$ from this population.
From the generated sample, we can compute the sample mean and sample median.
Do this for $M$ times to obtain $M$ sample means and $M$ sample medians.
From these $M$ values, we can compute the mean and variance of the sample mean and sample median so we can assess their bias and variance respectively.

mc_pois <- function(n, M, lambda){
    # initialization
    samp_mean   <- numeric(M)
    samp_median <- numeric(M)
    
    for (i in 1:M){
        # Step 2: generate observations
        samp <- rpois(n = n, lambda = lambda)
        
        # Step 3: Compute for the statistics of interest
        samp_mean[i]   <- mean(samp)
        samp_median[i] <- median(samp)
        
        # Step 4: do this M times
    }
    return(data.frame("samp_mean" = samp_mean,
                "samp_median" = samp_median))
 
}

result <- mc_pois(n = 100, M = 2500, lambda = 4)
result

ABCDEFGHIJ0123456789

samp_mean <dbl>	samp_median <dbl>
4.05	4.0
3.92	4.0
4.28	4.0
4.17	4.0
3.76	4.0
3.68	4.0
3.93	4.0
3.82	4.0
3.96	4.0
3.82	4.0

Step 5: Use the M values to study the distribution of sample mean and sample median

## bias
sapply(result, function(x) mean(x) - 4)

##   samp_mean samp_median 
##     0.00052    -0.08120

## variance
sapply(result, var)

##   samp_mean samp_median 
##  0.03932410  0.07703737

Our results suggest that the sample mean has less bias in estimating the population mean and has smaller variance than the sample median.

Varying population mean $\lambda$

Now, let’s see how the estimators perform, as we change the value of the population mean, specifically for values $\lambda = (1,2,..,10)$

bias_mean   <- numeric(10)
vari_mean   <- numeric(10)
bias_median <- numeric(10)
vari_median <- numeric(10)
    
for (i in 1:10){
    result <- mc_pois(n = 100, M = 1500, lambda = i)
    
    bias_mean[i] <- mean(result$samp_mean) - i
    vari_mean[i] <- var(result$samp_mean)
    
    bias_median[i] <- mean(result$samp_median) - i
    vari_median[i] <- var(result$samp_median)
}

result_plot <- data.frame(pop_mean = c(1:10,1:10),
                          bias = abs(c(bias_mean, bias_median)),
                          var  = abs(c(vari_mean, vari_median)),
                          estimator = c(rep("mean", 10), 
                                        rep("median", 10))
                          )

ggplot(result_plot, aes(x = pop_mean, y = bias, color = estimator))+
    geom_line() +
    ylab("absolute bias")+
    theme_bw() +
    xlab("lambda")

From this plot, we can see that as the population mean $\lambda$ increases, the bias of the sample median also increases

ggplot(result_plot, aes(x = pop_mean, y = var, color = estimator))+
    geom_line() +
    ylab("Variance")+
    theme_bw() +
    xlab("lambda")

Both estimators have larger variance for higher values of $\lambda$ , but the sample median’s variance increases faster.

Varying sample size $n$

In this simulation, we explore how the biases and variances of the sample mean and sample median change as we increase the sample size $n$ .

We test for values of $n=10, 20,...,90,100,200,...,500$ , fixing the population mean $\lambda$ at $5$ in all iterations.

bias_mean   <- numeric(14)
vari_mean   <- numeric(14)
bias_median <- numeric(14)
vari_median <- numeric(14)
sample_size <- c(1:10*10, 2:5*100)
j <- 0
for (i in sample_size){
    result <- mc_pois(n = i, M = 1000, lambda = 5)
    
    j <- j + 1
    
    bias_mean[j] <- mean(result$samp_mean) - 5
    vari_mean[j] <- var(result$samp_mean)
    
    bias_median[j] <- mean(result$samp_median) - 5
    vari_median[j] <- var(result$samp_median)
    
}

result_plot <- data.frame(n = rep(sample_size,2),
                          bias = abs(c(bias_mean, bias_median)),
                          var  = abs(c(vari_mean, vari_median)),
                          estimator = c(rep("sample mean", 14), 
                                        rep("sample median", 14))
                          )

ggplot(result_plot, aes(x = n, y = bias, 
                        color = estimator))+
    geom_line() +
    ylab("absolute bias")+
    theme_bw() +
    xlab("sample size")

ggplot(result_plot, aes(x = n, y = var, 
                        color = estimator))+
    geom_line() +
    ylab(unname(latex2exp::TeX(c("Variance (in $log_{10}$ scale)")))) +

    theme_bw() +
    xlab("sample size") +
    scale_y_continuous(trans = "log10", limits =c(1e-3, 1), 
                     labels = scales::scientific, expand = c(0, 0)) +
    scale_x_continuous(limits = c(10, 500), 
                     breaks = c(10, 50, 100, 500), expand = c(0, 0))

Exercise 3.3 Suppose we want to investigate the distribution of sample standard deviation (SD) of a size $n = 200$ random sample from a $Normal(2, 1)$ .

Using Monte Carlo Simulation with $M = 10000$ , show the relative frequency histogram of the distribution of sample SD.
Based on the simulation, what is the mean and variance of the distribution of sample SD?

Estimating Probability using Monte Carlo Experiment

We already showed that we can check the distribution of a statistic based on a Monte Carlo Simulation.

We can also obtain probabilities using Monte Carlo Simulation, based on the concept of a posteriori approach of assigning probabilities.

Suppose you want to obtain the probability $P(X\leq x)$ .

The general idea for obtaining a posteriori probabilities using Monte Carlo simulation is as follows:

Generate $M$ samples from the population $F$ .
The empirical CDF is given by

$\hat{F}(x) = \frac{1}{M}\sum_{m=1}^MI(x_m\leq x) \approx P(X\leq x)$

Example 1: Exponential

As a quick example, suppose $X \sim Exp(\lambda = 2)$ . We know that $P(X\leq 1) = 1 - \exp(-2\cdot 1)\approx 0.86$

If we sample large amount of times from $Exp(2)$ , we know that around $86\%$ of the generated data will be less than or equal to 1.

set.seed(142)
X <- rexp(100, rate = 2)
sum(X <= 1)/100

## [1] 0.86

We can plot the theoretical and empirical cdf to check if they really look the same.

Example 2: CLT

For another demonstration, we visualize the Central Limit theorem. Recall its (basic) definition:

Theorem 3.2 (Central Limit Theorem) Let $X_1,X_2,...,X_n$ be a random sample from a population with mean $\mu$ and finite variance $\sigma^2$ , and let $\bar{X}_n=\frac{1}{n}\sum_{i=1}^nX_i$ .

For a sufficiently large sample size $n$ , the statistic

$\frac{\bar{X}_n-\mu}{\sigma/\sqrt{n}}$ approximately follows the standard normal distribution.

Suppose the population has a distribution $Poisson(\lambda = 6)$ . The mean and variance are both $\lambda = 6$ .

Let’s obtain a random sample of size $n=30$ repeatedly and examine the distribution of the statistic $\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}$ .

M   <- 1000
n   <- 30
vec <- numeric(M)
for(m in 1:M){
    set.seed(m)
    X <- rpois(30, lambda = 6)
    vec[m] <- (mean(X) - 6)/sqrt(6/n)
}

It looks standard normal, doesn’t it?

Suppose we obtained the following sample from the population:

set.seed(142)
samp <- rpois(n = 30, lambda = 6)

Computing $\frac{\bar{X}-\mu}{\sigma/\sqrt{n}}$ using this sample, we have:

z <- (mean(samp) - 6)/sqrt(6/30)
z

## [1] 1.565248

By the central limit theorem, $P(Z \leq 1.57)\approx \Phi(1.57)=0.9412$

pnorm(z)

## [1] 0.9412376

But using Monte-Carlo Simulation, we count how many samples will have a Z-statistic that are less than the observed Z-statistic

sum(vec <= z)/M

## [1] 0.947

Close enough?

Exercise 3.4 Let $X$ be a random variable that is uniformly distributed over $[−1, 1]$ and $Y$ is a random variable from $N(1, 1)$ that is independent of $X$ .

Using Monte Carlo Simulation with $M=1000$ and seed number = 142, what is the probability that $X + Y > 1.5$ ?

Euler’s Method vs Monte Carlo Integration

Suppose we want to evaluate a definite integral:

$\int_a^b f(x)dx$ where $f$ is some function. For most functions, there is no closed-form expression for such definite integrals.

Numerical analysis provides various means of approximating definite integrals, starting with Euler’s method for one-dimensional integrals:

$\int_a^bf(x)dx\approx\sum_{i=1}^{\lfloor(b-a)/h \rfloor} hf\left(a+ih\right)$

However, using this method is very slow, especially for high-dimensional computing. One has to evaluate the function $f$ many times. As an alternative, physicists has to rely on some random approximation scheme, or the Monte Carlo Method.

Recall that the expected value of $f$ with respect to a distribution with probability density $p$ with a support $D$ is

$\mathbb{E}_p[f(X)]=\int_D f(x)p(x)dx$

Based on law of large of numbers, for IID random variables, the sample mean converges to the expected value:

$\frac{1}{n}\sum_{i=1}^nf(X_i)\rightarrow\mathbb{E}_p[f(X)]=\int_{D} f(x)p(x)dx$

The most basic Monte Carlo method for evaluating the definite integral $\int_a^b f(x)dx$ is to set $p$ as the pdf of a Uniform random variable.

Draw the random variable $X$ uniformly over the domain $[a,b]$ , which has a pdf $\frac{1}{b-a}$

$\begin{align} \int_a^b f(x)dx &=(b-a)\int_a^b f(x) \frac{1}{b-a} dx \\ &=(b-a)\mathbb{E}[f(X)] \end{align}$

and by law of large numbers,

$\frac{b-a}{n}\sum_{i=1}^nf(X_i)\rightarrow\int_a^bf(x)dx$

Example 1

To demonstrate, let us perform integration on $f(x)=\sin(x)$ over $[0,\pi]$ . The exact solution is

$\int_0^\pi\sin(x)dx=2$

Using Monte Carlo integration, we can approximate it with these steps:

Generate $n=10000$ values from $Uniform(0,\pi)$
Evaluate $f(x)=\sin(x)$ at these points
Get the average and multiply by the measure of the domain, which is $\pi-0$ .

X <- runif(10000, 0, pi)
pi*mean(sin(X))

## [1] 2.00878

Exercise 3.5 Approximate the following definite integral using Monte Carlo Integration:

$\int_0^1 e^{-x^2}dx$

Use $M = 1\text{e6}$

Example 2: What if the integral limit/s is infinite?

Sampling from the uniform distribution is useless for an integral like

$\int_0^\infty x^2e^{-x^2}dx$

The trick is to introduce a pdf with the same support as the integral limits. In general,

$\int_D f(x)dx=\int_D \frac{f(x)}{p(x)}p(x)dx = \mathbb{E}_p\left[\frac{f(X)}{p(X)}\right]$

And now, if $X_i$ is generated from $p$ ,

$\frac{1}{n}\sum_{i=1}^n\frac{f(X_i)}{p(X_i)} \rightarrow \mathbb{E}_p\left[\frac{f(X)}{p(X)}\right] = \int_D f(x)dx$

Solution for Example 2

Suppose we have the following integral

$\int_0^\infty x^2e^{-x^2}dx$

The exact solution for this is $\sqrt{\pi}/4$
The Exponential distribution has bounds $[0,\infty)$ , and has pdf $\lambda e^{-\lambda x}$ .
We can sample from $Exp(\lambda = 1)$ to solve the above integral.

$\begin{align} \int_0^\infty x^2e^{-x^2}dx &= \int_0^\infty \frac{e^{-x}}{e^{-x}}x^2e^{-x^2}dx\\ &= \int_0^\infty \frac{x^2e^{-x^2}}{e^{-x}}e^{-x}dx\\ &= \int_0^\infty x^2e^{-x^2+x}e^{-x}dx \\ &= \mathbb{E}(X^2\exp(-X^2+X)) \end{align}$ where $X\sim Exp(1)$

X <- rexp(1e6, rate = 1)
mean(X^2*exp(-X^2+X))

## [1] 0.443195

sqrt(pi)/4

## [1] 0.4431135

Exercise 3.6 The following integral has no closed form solution:

$\int _{-\infty} ^\infty \frac{1}{1 + x^4}dx$

Approximate this integral using Monte Carlo integration. Use $M = 1\text{e6}$

3.3.2 (Optional) Advanced Examples

Markov-Chain Monte Carlo

Definition 3.11 A Markov chain is a stochastic process that transitions from one state to another within a defined set of states, following certain probabilistic rules.

The key characteristic of a Markov chain is the Markov property, which states that the future state depends only on the present state and not on the sequence of past states. This is expressed mathematically

$P(X_{n+1}|X_n,X_{n-1},...,X_0) = P(X_{n+1}|X_n)$

where $X_n$ represents the state at step $n$ .

Generating a Markov Chain

Set up the conditional distribution.
Draw the initial state of the chain.
For every additional draw, use the previous draw to inform the new one.

Definition 3.12 (Markov Chain Monte Carlo) MCMC is a class of algorithms that use Markov chains to generate samples from probability distributions when direct sampling is challenging.

Unlike standard Monte Carlo methods, MCMC builds a dependent sequence of samples that ultimately converges to the target distribution.

Common MCMC methods include:

Metropolis-Hastings Algorithm: Uses a proposal distribution and acceptance-rejection criterion.
Gibbs Sampling: Special case of MCMC where samples are drawn from conditional distributions iteratively.

MCMC is essential in Bayesian statistics, allowing estimation of posterior distributions for complex models.

Agent-based modelling

Definition 3.13 Agent-based models are computer simulations used to study the interactions between people, things, places, and time. They are stochastic models built from the bottom up meaning individual agents (often people in epidemiology) are assigned certain attributes.

The agents are programmed to behave and interact with other agents and the environment in certain ways.

Monte Carlo methods are used to understand the stochasticity of these models.

Applications of ABM

Risk and disaster management: simulate or predict possible displacement for a given flood or typhoon scenario in an area
Economics: Modeling markets, supply chains, or consumer behavior.
Epidemiology: Simulating the spread of diseases (e.g., COVID-19 transmission).
Ecology: Studying animal movements, predator-prey dynamics, or deforestation.
Social Sciences: Understanding crowd behavior, opinion dynamics, or urban development.
Computational Biology: Modeling cell interactions or immune responses.

SUMMARY:

When we don’t have exact probability formulas or they don’t apply, we can simulate to get arbitrarily good approximations
If we can describe the process of our model, we can set up a simulation of it