3.3 Design and Analysis of Monte Carlo Methods
According to Murdoch [2000], the term Monte Carlo originally referred to simulations that involved random walks and was first used by Jon von Neumann and S. M. Ulam in the 1940’s.
Monte Carlo methods involve simulating a process multiple times and using statistical inference to approximate unknown quantities.
The fundamental idea is to use randomness to solve deterministic problems.
Identify an appropriate model (or models) that represents (or represent) the population of interest.
Generate random observations using the model (or models).
Compute outcome based on the sampled inputs. For example, calculate the value for the statistic of interest using the random observations.
Repeat Steps 2 and 3 for M trials.
Aggregate the outcomes of the M trials. For example, use the M values to study the distribution of the statistic.
3.3.1 Basic Examples
R provides a rich set of functions for generating random numbers and running Monte Carlo simulations. Below are some fundamental examples.
Motivation: Estimating π using random number generation
Consider a unit circle centered at the origin with radius r=1. The equation for the points that lie inside the unit circle is:
x2+y2≤1
We know that the area of a unit circle is π.
Now, let’s focus on the first quadrant, (x,y≥0), where the area of the quarter circle is
Aquarter circle=π4
The Monte Carlo approach approximates this area by randomly sampling points inside the unit square (which has area = 1), then getting the proportion of points that are inside the quarter circle.
Then the final (estimated) value ˆπ is 4 times the estimated area of the quarter circle.
The following algorithm can be used to approximate π.
Generate M random points (x,y) where x and y are sampled uniformly from [0,1].
Count how many points fall under the quarter unit circle:
x2+y2≤1
- The estimated value of pi is
π≈4×# of points in the quarter circletotal points
## [1] 3.14664
In fact, by law of law large numbers, we can observe that as the number of Monte Carlo iterations increase, we can obtain an estimate of π that is closer to its actual value.
Today, the Monte Carlo method refers to any simulation that involves the use of random numbers.
Monte Carlo Experiment: Mean of Poisson
Inferential statistics draws conclusions about a population from a random sample and assesses their reliability.
To assess reliability, the statistician must understand the distribution of any statistics that are used in the analysis.
The sampling distribution of a statistic can sometimes be derived with analytical solutions. But sometimes, it is just difficult to do so.
Definition 3.10 Monte Carlo experiments are a class of computational algorithms that rely on repeated random sampling to estimate numerical results. They are widely used in various fields, including statistics, physics, finance, and machine learning, to approximate complex mathematical problems that may not have closed-form solutions.
Suppose our population is best modelled by a Poisson distribution (commonly used for count data). The Poisson distribution is completely described by a single parameter: rate, λ=μ.
The question here is
“Is the sample mean still unbiased and does it have lower variance than other estimators, say sample median?”
To answer this question, we have two approaches:
prove analytically that the sample mean mean is unbiased and have lower variance than sample median
show numerically that these statements are true
Monte Carlo experiments allow us to answer these questions using the second approach.
In this example, we will compare the bias and variance of the two estimators, sample mean and sample median in estimating the population mean for populations distributed as Poisson.
Suppose our population of interest is an infinite population that is Poisson distributed with rate parameter λ.
Obtain a sample of size n from this population.
From the generated sample, we can compute the sample mean and sample median.
Do this for M times to obtain M sample means and M sample medians.
From these M values, we can compute the mean and variance of the sample mean and sample median so we can assess their bias and variance respectively.
mc_pois <- function(n, M, lambda){
# initialization
samp_mean <- numeric(M)
samp_median <- numeric(M)
for (i in 1:M){
# Step 2: generate observations
samp <- rpois(n = n, lambda = lambda)
# Step 3: Compute for the statistics of interest
samp_mean[i] <- mean(samp)
samp_median[i] <- median(samp)
# Step 4: do this M times
}
return(data.frame("samp_mean" = samp_mean,
"samp_median" = samp_median))
}
result <- mc_pois(n = 100, M = 2500, lambda = 4)
result
samp_mean <dbl> | samp_median <dbl> | |||
---|---|---|---|---|
4.05 | 4.0 | |||
3.92 | 4.0 | |||
4.28 | 4.0 | |||
4.17 | 4.0 | |||
3.76 | 4.0 | |||
3.68 | 4.0 | |||
3.93 | 4.0 | |||
3.82 | 4.0 | |||
3.96 | 4.0 | |||
3.82 | 4.0 |
Step 5: Use the M values to study the distribution of sample mean and sample median
## samp_mean samp_median
## 0.00052 -0.08120
## samp_mean samp_median
## 0.03932410 0.07703737
Our results suggest that the sample mean has less bias in estimating the population mean and has smaller variance than the sample median.
Varying population mean λ
Now, let’s see how the estimators perform, as we change the value of the population mean, specifically for values λ=(1,2,..,10)
bias_mean <- numeric(10)
vari_mean <- numeric(10)
bias_median <- numeric(10)
vari_median <- numeric(10)
for (i in 1:10){
result <- mc_pois(n = 100, M = 1500, lambda = i)
bias_mean[i] <- mean(result$samp_mean) - i
vari_mean[i] <- var(result$samp_mean)
bias_median[i] <- mean(result$samp_median) - i
vari_median[i] <- var(result$samp_median)
}
result_plot <- data.frame(pop_mean = c(1:10,1:10),
bias = abs(c(bias_mean, bias_median)),
var = abs(c(vari_mean, vari_median)),
estimator = c(rep("mean", 10),
rep("median", 10))
)
ggplot(result_plot, aes(x = pop_mean, y = bias, color = estimator))+
geom_line() +
ylab("absolute bias")+
theme_bw() +
xlab("lambda")
From this plot, we can see that as the population mean λ increases, the bias of the sample median also increases
ggplot(result_plot, aes(x = pop_mean, y = var, color = estimator))+
geom_line() +
ylab("Variance")+
theme_bw() +
xlab("lambda")
Both estimators have larger variance for higher values of λ, but the sample median’s variance increases faster.
Varying sample size n
In this simulation, we explore how the biases and variances of the sample mean and sample median change as we increase the sample size n.
We test for values of n=10,20,...,90,100,200,...,500, fixing the population mean λ at 5 in all iterations.
bias_mean <- numeric(14)
vari_mean <- numeric(14)
bias_median <- numeric(14)
vari_median <- numeric(14)
sample_size <- c(1:10*10, 2:5*100)
j <- 0
for (i in sample_size){
result <- mc_pois(n = i, M = 1000, lambda = 5)
j <- j + 1
bias_mean[j] <- mean(result$samp_mean) - 5
vari_mean[j] <- var(result$samp_mean)
bias_median[j] <- mean(result$samp_median) - 5
vari_median[j] <- var(result$samp_median)
}
result_plot <- data.frame(n = rep(sample_size,2),
bias = abs(c(bias_mean, bias_median)),
var = abs(c(vari_mean, vari_median)),
estimator = c(rep("sample mean", 14),
rep("sample median", 14))
)
ggplot(result_plot, aes(x = n, y = bias,
color = estimator))+
geom_line() +
ylab("absolute bias")+
theme_bw() +
xlab("sample size")
ggplot(result_plot, aes(x = n, y = var,
color = estimator))+
geom_line() +
ylab(unname(latex2exp::TeX(c("Variance (in $log_{10}$ scale)")))) +
theme_bw() +
xlab("sample size") +
scale_y_continuous(trans = "log10", limits =c(1e-3, 1),
labels = scales::scientific, expand = c(0, 0)) +
scale_x_continuous(limits = c(10, 500),
breaks = c(10, 50, 100, 500), expand = c(0, 0))
Exercise 3.3 Suppose we want to investigate the distribution of sample standard deviation (SD) of a size n=200 random sample from a Normal(2,1).
Using Monte Carlo Simulation with M=10000, show the relative frequency histogram of the distribution of sample SD.
Based on the simulation, what is the mean and variance of the distribution of sample SD?
Estimating Probability using Monte Carlo Experiment
We already showed that we can check the distribution of a statistic based on a Monte Carlo Simulation.
We can also obtain probabilities using Monte Carlo Simulation, based on the concept of a posteriori approach of assigning probabilities.
Suppose you want to obtain the probability P(X≤x).
The general idea for obtaining a posteriori probabilities using Monte Carlo simulation is as follows:
Generate M samples from the population F.
The empirical CDF is given by
ˆF(x)=1MM∑m=1I(xm≤x)≈P(X≤x)
Example 1: Exponential
As a quick example, suppose X∼Exp(λ=2). We know that P(X≤1)=1−exp(−2⋅1)≈0.86
If we sample large amount of times from Exp(2), we know that around 86% of the generated data will be less than or equal to 1.
## [1] 0.86
We can plot the theoretical and empirical cdf to check if they really look the same.
Example 2: CLT
For another demonstration, we visualize the Central Limit theorem. Recall its (basic) definition:
Theorem 3.2 (Central Limit Theorem) Let X1,X2,...,Xn be a random sample from a population with mean μ and finite variance σ2, and let ˉXn=1n∑ni=1Xi.
For a sufficiently large sample size n, the statistic
ˉXn−μσ/√n approximately follows the standard normal distribution.
Suppose the population has a distribution Poisson(λ=6). The mean and variance are both λ=6.
Let’s obtain a random sample of size n=30 repeatedly and examine the distribution of the statistic ˉX−μσ/√n.
M <- 1000
n <- 30
vec <- numeric(M)
for(m in 1:M){
set.seed(m)
X <- rpois(30, lambda = 6)
vec[m] <- (mean(X) - 6)/sqrt(6/n)
}
It looks standard normal, doesn’t it?
Suppose we obtained the following sample from the population:
Computing ˉX−μσ/√n using this sample, we have:
## [1] 1.565248
By the central limit theorem, P(Z≤1.57)≈Φ(1.57)=0.9412
## [1] 0.9412376
But using Monte-Carlo Simulation, we count how many samples will have a Z-statistic that are less than the observed Z-statistic
## [1] 0.947
Close enough?
Exercise 3.4 Let X be a random variable that is uniformly distributed over [−1,1] and Y is a random variable from N(1,1) that is independent of X.
Using Monte Carlo Simulation with M=1000 and seed number = 142, what is the probability that X+Y>1.5?
Euler’s Method vs Monte Carlo Integration
Suppose we want to evaluate a definite integral:
∫baf(x)dx where f is some function. For most functions, there is no closed-form expression for such definite integrals.
Numerical analysis provides various means of approximating definite integrals, starting with Euler’s method for one-dimensional integrals:
∫baf(x)dx≈⌊(b−a)/h⌋∑i=1hf(a+ih)
However, using this method is very slow, especially for high-dimensional computing. One has to evaluate the function f many times. As an alternative, physicists has to rely on some random approximation scheme, or the Monte Carlo Method.
Recall that the expected value of f with respect to a distribution with probability density p with a support D is
Ep[f(X)]=∫Df(x)p(x)dx
Based on law of large of numbers, for IID random variables, the sample mean converges to the expected value:
1nn∑i=1f(Xi)→Ep[f(X)]=∫Df(x)p(x)dx
The most basic Monte Carlo method for evaluating the definite integral ∫baf(x)dx is to set p as the pdf of a Uniform random variable.
Draw the random variable X uniformly over the domain [a,b], which has a pdf 1b−a
∫baf(x)dx=(b−a)∫baf(x)1b−adx=(b−a)E[f(X)]
and by law of large numbers,
b−ann∑i=1f(Xi)→∫baf(x)dx
Example 1
To demonstrate, let us perform integration on f(x)=sin(x) over [0,π]. The exact solution is
∫π0sin(x)dx=2
Using Monte Carlo integration, we can approximate it with these steps:
Generate n=10000 values from Uniform(0,π)
Evaluate f(x)=sin(x) at these points
Get the average and multiply by the measure of the domain, which is π−0.
## [1] 2.00878
Exercise 3.5 Approximate the following definite integral using Monte Carlo Integration:
∫10e−x2dx
Use M=1e6
Example 2: What if the integral limit/s is infinite?
Sampling from the uniform distribution is useless for an integral like
∫∞0x2e−x2dx
The trick is to introduce a pdf with the same support as the integral limits. In general,
∫Df(x)dx=∫Df(x)p(x)p(x)dx=Ep[f(X)p(X)]
And now, if Xi is generated from p,
1nn∑i=1f(Xi)p(Xi)→Ep[f(X)p(X)]=∫Df(x)dx
Solution for Example 2
Suppose we have the following integral
∫∞0x2e−x2dx
The exact solution for this is √π/4
The Exponential distribution has bounds [0,∞), and has pdf λe−λx.
We can sample from Exp(λ=1) to solve the above integral.
∫∞0x2e−x2dx=∫∞0e−xe−xx2e−x2dx=∫∞0x2e−x2e−xe−xdx=∫∞0x2e−x2+xe−xdx=E(X2exp(−X2+X)) where X∼Exp(1)
## [1] 0.443195
## [1] 0.4431135
Exercise 3.6 The following integral has no closed form solution:
∫∞−∞11+x4dx
Approximate this integral using Monte Carlo integration. Use M=1e6
3.3.2 (Optional) Advanced Examples
Markov-Chain Monte Carlo
Definition 3.11 A Markov chain is a stochastic process that transitions from one state to another within a defined set of states, following certain probabilistic rules.
The key characteristic of a Markov chain is the Markov property, which states that the future state depends only on the present state and not on the sequence of past states. This is expressed mathematically
P(Xn+1|Xn,Xn−1,...,X0)=P(Xn+1|Xn)
where Xn represents the state at step n.
Generating a Markov Chain
Set up the conditional distribution.
Draw the initial state of the chain.
For every additional draw, use the previous draw to inform the new one.
Definition 3.12 (Markov Chain Monte Carlo) MCMC is a class of algorithms that use Markov chains to generate samples from probability distributions when direct sampling is challenging.
Unlike standard Monte Carlo methods, MCMC builds a dependent sequence of samples that ultimately converges to the target distribution.
Common MCMC methods include:
Metropolis-Hastings Algorithm: Uses a proposal distribution and acceptance-rejection criterion.
Gibbs Sampling: Special case of MCMC where samples are drawn from conditional distributions iteratively.
MCMC is essential in Bayesian statistics, allowing estimation of posterior distributions for complex models.
Agent-based modelling
Definition 3.13 Agent-based models are computer simulations used to study the interactions between people, things, places, and time. They are stochastic models built from the bottom up meaning individual agents (often people in epidemiology) are assigned certain attributes.
The agents are programmed to behave and interact with other agents and the environment in certain ways.
Monte Carlo methods are used to understand the stochasticity of these models.
Applications of ABM
Risk and disaster management: simulate or predict possible displacement for a given flood or typhoon scenario in an area
Economics: Modeling markets, supply chains, or consumer behavior.
Epidemiology: Simulating the spread of diseases (e.g., COVID-19 transmission).
Ecology: Studying animal movements, predator-prey dynamics, or deforestation.
Social Sciences: Understanding crowd behavior, opinion dynamics, or urban development.
Computational Biology: Modeling cell interactions or immune responses.
SUMMARY:
When we don’t have exact probability formulas or they don’t apply, we can simulate to get arbitrarily good approximations
If we can describe the process of our model, we can set up a simulation of it