3.3 Design and Analysis of Monte Carlo Experiments

According to Murdoch [2000], the term Monte Carlo originally referred to simulations that involved random walks and was first used by Jon von Neumann and S. M. Ulam in the 1940’s.

3.3.1 Motivation: Euler’s Method vs Monte Carlo Integration

Suppose we want to evaluate a definite integral:

baf(x)dx where f is some function. For most functions, there is no closed-form expression for such definite integrals.

Numerical analysis provides various means of approximating definite integrals, starting with Euler’s method for one-dimensional integrals:

baf(x)dx(ba)/hi=1hf(a+ih)

However, using this method is very slow, especially for high-dimensional computing. One has to evaluate the function f many times.

As an alternative, physicists has to rely on some random approximation scheme, called Monte Carlo Method, named after the European gambling resort.

Recall that the expected value of f with respect to a distribution with probability density p with a support D is

Ep[f(X)]=Df(x)p(x)dx

Based on law of large of numbers, for IID random variables, the sample mean converges to the expected value:

1nni=1f(Xi)Ep[f(X)]=Df(x)p(x)dx

The most basic Monte Carlo method for evaluating the definite integral baf(x)dx is to set p as the pdf of a Uniform random variable.

Draw the random variable X uniformly over the domain [a,b], which has a pdf 1ba

baf(x)dx=(ba)baf(X)1badx=(ba)E[f(X)]

and by law of large numbers,

banni=1f(Xi)baf(x)dx Example 1

To demonstrate, let us perform integration on f(x)=sin(x) over [0,π]. The exact solution is

π0sin(x)dx=2

Using Monte Carlo integration, we can approximate it with these steps:

  1. Generate n=10000 values from Uniform(0,π)

  2. Evaluate f(x)=sin(x) at these points

  3. Get the average and multiply by the measure of the domain, which is π0.

X <- runif(10000, 0, pi)
pi*mean(sin(X))
## [1] 2.018804

Example 2 Approximate the definite integral 10ex2dx

X <- runif(10000)
mean(exp(-X^2))
## [1] 0.7474153

What if the integral limits is infinite?

The above approach is useless for an integral like

0x2ex2dx

The trick is to introduce a pdf with the same support as the integral limits. In general,

Df(x)dx=Df(x)p(x)p(x)dx=Ep[f(X)p(X)]

And now, if Xi is generated from p,

1nni=1f(Xi)p(Xi)Ep[f(X)p(X)]=Df(x)dx

Today, the Monte Carlo method refers to any simulation that involves the use of random numbers. We have shown that this is an easy and inexpensive way of to understand a phenomenon of interest.

3.3.2 The General Strategy

  1. Identify an appropriate model (or models) that represents (or represent) the population of interest.

  2. Generate random observations using the model (or models).

  3. Calculate the value for the statistic of interest using the random observations.

  4. Repeat Steps 2 and 3 for M trials.

  5. Use the M values to study the distribution of the statistic.

3.3.3 Examples

Importance Sampling

Metropolis-Hastings Algorithm

Exercise

  1. Estimating π using Monte Carlo Simulation

    Consider a unit circle centered at the origin with radius r=1 The equation of the circle is:

    x2+y21

    We know that the area of a unit circle is π.

    Now, let’s focus on the first quadrant, (x,y0), where the area of the quarter circle is

    Aquarter circle=π4

    The Monte Carlo approach approximates this area by randomly sampling points inside the unit square (which has area = 1), then getting the proportion of points that are inside the quarter circle.

    Then the final (estimated) value ^pi is 4 times the area of the quarter circle.

    Write an R function `estimate_pi(n) that takes n as input and executes the following algorithm in R to approximate π.

    • Generate n random points (x,y) where x and y are sampled uniformly from [0,1].

    • Count how many points fall under the quarter unit circle:

    x2+y21

    • The estimated value of pi is

    π4×point in the circletotal points

    Try for different values of n

  2. Assume we want to investigate the distribution of sample standard deviation (SD) of a size n=200 random sample from a N(2,1). Use Monte Carlo Simulation with N=10000 to show the relative frequency histogram of the distribution of sample SD.

  3. What is the variance of the distribution of sample SD?

  4. Let X be a random variable that is uniformly distribution over [1,1] and Y is a random variable from N(1,1) that is independent of X. What is the probability that X + Y > 1.5?

  5. Let U and V be two independent random variables that both are from a uniform distribution over [0, 1]. Use the Monte Carlo Simulation with size