5 Monte Carlo Simulations
5.1 Review
In the theoretical part of the class you learned about statistical properties of estimators. To be precise, let ˆθn be an estimator of a parameter θ. The estimator is a functions of the sample, say {X1,X2,…,Xn}. The sample consists of random variable and any function of the sample, including ˆθn, is a random variable as well. That is, for different realizations of the data, we get different realizations of the estimator.
These arguments can be confusing because in practice we typically only have one data set and therefore only one realization of the estimator. But we could also think of all the other realizations of the data we could have obtained. For example, consider your starting salary after graduating from Bonn. Let θ be the expected starting salary, which is a fixed number that we do not know. After everyone started working and we see your actual starting salaries, we can calculate the sample average as an estimate of the expected value, but this estimate is just one potential realization. If things had turned out slightly different, your starting salaries might have been different and we would have seen a different realization of the estimator.
Since an estimator is a random variable, it has a distribution and (typically) a mean and a variance. You learned that we call an estimator unbiased if E(ˆθn)=θ That is, the average value of the estimator over all the hypothetical data sets we could have obtained is equal to θ. Unbiasedness can hold for any sample size n and is therefore called a small-sample property. Unbiasedness does not imply that ˆθn is necessarily close to θ.
You also saw that an estimator can be weakly consistent: ˆθn→Pθ and asymptotically normally distributed √n(ˆθn−θ)→DN(0,v2) as n→∞. These properties only hold in the limit and are called large-sample properties. Knowing the large-sample distribution is particularly useful because it allows us to test hypotheses and obtain confidence intervals. For example, let ˆvn be a weakly consistent estimator of v. Then P(θ∈[ˆθn−1.96ˆvn√n,ˆθn+1.96ˆvn√n])→0.95 and the interval [ˆθn−1.96ˆvn√n,ˆθn+1.96ˆvn√n] is a 95% confidence interval for θ. That is, looking at all hypothetical data sets, θ is in 95% of the confidence intervals. Notice that in different data sets, we get different intervals, but the parameter θ does not change.
The 95% coverage probability only holds in the limit, but for any fixed sample size P(θ∈[ˆθn−1.96ˆvn√n,ˆθn+1.96ˆvn√n]) could be much smaller than 95%. This actual coverage probability is unknown in applications.
The goal of Monte Carlo simulations is typically to investigate small sample properties of estimators, such as the actual coverage probability of confidence intervals for fixed n. To do so, we can simulate many random samples from an underlying distribution and obtain the realization of the estimator for each sample. We can then use the realizations to approximate the actual small sample distribution of the estimator and check properties, such as coverage probabilities of confidence intervals. Of course, the properties of the estimator depend on which underlying distribution we sample from.
5.2 Simple Simulation Example
As a simple example, let {X1,X2,…,Xn} be a random sample from some distribution FX. Define θ=E(X) and let ˆθn=ˉXn=1nn∑i=1Xi and sn=√1n−1n∑i=1(Xi−ˉXn)2 Applications of a Law of Large Numbers and a Central Limit Theorem for iid data imply that ˆθn→Pθ, sn→Psd(X), and √n(ˆθn−θ)→DN(0,Var(X)) The previously discussed confidence interval for E(X) is [ˉXn−1.96sn√n,ˉXn+1.96sn√n]
Exercise: Suppose X∼U[0,2] where U[0,2] denotes the uniform distribution on the interval [0,2]. It is easy to show (make sure to verify this!) that E(X)=1. Now repeat the following two steps at least 1,000 times:
- Take a random sample from the distribution of X with sample size n=20.
- Calculate the sample average and a 95% confidence interval for E(X) using the sample from (1).
You should now have 1,000 sample averages and 1,000 confidence intervals, which you can use to answer the following questions:
What is average of your 1,000 sample averages.
What is the standard deviation of your 1,000 sample averages.
How many times is E(X) in the 95% confidence interval?
The answers to these questions changes when change the underlying distribution of X or the sample size.
5.3 Consistency
The estimator ˆθn is weakly consistent if for any ε>0 P(|ˆθn−θ|>ε)→0 or P(|ˆθn−θ|≤ε)→1 That is, as long as n is large enough, ˆθn is in an arbitrarily small neighborhood around θ with arbitrarily large probability.
To illustrate this property, let’s return to the example with sample averages and expectations. Let {X1,X2,…,Xn} be a random sample from the Bernoulli distribution with p=P(Xi=1)=0.7. Let θ=E(X)=p and let ˆθn be the sample average. We start with a sample size of 20 and set the parameters:
<- 20
n <- 0.7
p = p theta
We now want to take S random samples and obtain S corresponding sample averages, where S is ideally a large number. The larger S, the more precise our results will be. It is also useful to set a seed.
set.seed(4519)
<- 10000
S <- replicate(S,rbinom(n,1,p))
X <- colMeans(X) theta_hat
Now for some ε>0, we can check P(|ˆθn−θ|≤ε):
<- 0.1
eps mean(abs(theta_hat - p) <= eps)
## [1] 0.647
We can also calculate this probability for different values of ε>0. I use a loop here for readability, but you can try using matrices on your own. Since I want to repeat the following steps for different sample sizes, it is useful to first define a functions that performs all operations:
<- function(n,S,p,all_eps) {
calc_probs_consistency = length(all_eps)
n_eps <- replicate(S,rbinom(n,1,p))
X <- colMeans(X)
theta_hat = rep(0, n_eps)
probs for (i in 1:n_eps) {
= mean(abs(theta_hat - p) <= all_eps[i])
probs[i]
}return(probs)
}
We can now call the function for any n, S, p, and a vector containing different values for ε. For each of these values, the function then returns a probability.
= seq(0.01, 0.5, by= 0.01)
all_eps <- calc_probs_consistency(n = 20, S = 1000, p = 0.7, all_eps = all_eps) probs
Next, we can plot all probabilities:
plot(all_eps,probs,ylim = c(0,1),xlab="epsilon",ylab="Probability", col = 'blue')
We can see from the plot that P(|ˆθn−θ|≤ε) is quite small for small values of ε. Consistency means that for any ε>0, the probability approaches 1 as n→∞. We can now easily change the sample size and see how the results change. For example, take n=200:
<- calc_probs_consistency(n = 200, S = 10000, p = 0.7, all_eps = seq(0.01, 0.5, by= 0.01))
probs plot(all_eps,probs,ylim = c(0,1),xlab="epsilon",ylab="Probability", col = 'blue')
5.4 Asymptotic Normality
We know from the Central Limit Theorem that √n(ˉXn−E(X))sn→DN(0,1) where sn is the sample standard deviation. Convergence in distribution means that the distribution of the normalized sample average converges to the distribution function of a standard normal random variable.
Again, we can use the previous simulation setup to illustrate this property. To do so, we calculate P(√n(ˉXn−E(X))sn≤z) and compare it to the standard normal cdf evaluated at z. We then plot the results as a function of z. Just as before, we define a function that calculates the probabilities:
<- function(n,S,p,z) {
calc_probs_asymp_normal = length(z)
n_z <- replicate(S,rbinom(n,1,p))
X <- colMeans(X)
theta_hat <- apply(X, 2, sd)
stds = rep(0, n_z)
probs for (i in 1:n_z) {
= mean( sqrt(n)*(theta_hat - p)/stds <= z[i])
probs[i]
}return(probs)
}
We can now plot the results for different sample sizes such as
= seq(-3,3,by=0.01)
z <- calc_probs_asymp_normal(n = 20, S = 10000, p = 0.7, z = z)
probs plot(z,probs,ylim = c(0,1),xlab="z",ylab="Probability", col = 'blue')
lines(z, pnorm(z, 0, 1), col = 'red',lwd = 2)
or
<- calc_probs_asymp_normal(n = 2000, S = 10000, p = 0.7, z = z)
probs plot(z,probs,ylim = c(0,1),xlab="z",ylab="Probability", col = 'blue')
lines(z, pnorm(z, 0, 1), col = 'red',lwd = 2)
5.5 Testing
Let {X1,X2,…,Xn} be a random sample and define μ=E(X). Suppose we want to test H0:μ=μ0 where μ0 is a fixed number. To do so, define t=√n(ˉXn−μ0)sn An immediate consequence of the Central Limit Theorem is that if H0 is true then t=√n(ˉXn−μ0)sn=√n(ˉXn−μ)sn→DN(0,1) and therefore P(|t|>1.96∣μ=μ0)→0.05 Hence, if we reject when |t|>1.96, we only reject a true null hypothesis with probability approaching 5%.
If the null hypothesis is false, we can write t=√n(ˉXn−μ0)sn=√n(ˉXn−μ)sn+√n(μ−μ0)sn→DN(0,1) Since in this case √n|μ−μ0|→∞, it follows that P(|t|>1.96∣μ≠μ0)→1 That is, we reject a false null hypothesis with probability approaching 1. However, for a small sample size, the rejection probability is typically much smaller when μ0 is close to μ.
We now want to calculate the small sample rejection probabilities using Monte Carlo simulations. We could either vary μ0 (the value we test) or μ (the true mean). We take the latter because we can then plot the power functions you have seen in the theoretical part of the class to illustrate the results.
We work with the Bernoulli distribution again and define a function that calculates certain rejection probabilities.
<- function(n,S,mus,mu0,alpha) {
calc_rej_probs = length(mus)
n_mus = rep(0, n_mus)
probs for (i in 1:n_mus) {
<- replicate(S,rbinom(n,1,mus[i]))
X <- colMeans(X)
theta_hat <- apply(X, 2, sd)
stds = mean(abs( sqrt(n)*(theta_hat - mu0)/stds ) > qnorm(1-alpha/2) )
probs[i]
}return(probs)
}
We can now call the function for different values of μ and plot the rejection probabilities
= seq(0.5,0.9,by=0.01)
mus <- calc_rej_probs(n = 200, S = 10000, mus = mus, mu0 = 0.7, alpha = 0.05)
probs plot(mus,probs,ylim = c(0,1),xlab="True mean",ylab="Rejection Probability", col = 'blue')
abline(h=0.05, col = 'red',lwd = 2)
Think about how the figure changes as the sample size increases.
5.6 Regression
In the previous section, we considered a simple parameter (an expected value) and a simple estimator (the sample average). We can analyze analogous issues in other settings, such as the linear regression model. Also in this case, we have parameters (the regression coefficients) and estimators (the OLS estimators). The parameters are random and have certain known small and large sample properties that you discussed in the theoretical part of the class. Many other small sample properties, such as actual coverage rates of confidence intervals are unknown, but can be studies in Monte Carlo simulations.
Here is a task for you to think about on your own. We will discuss the solutions in class.
Exercise: Let Yi=β0+β1Xi1+β2Xi2+εi,i=1,2,…,n. where (Xi1Xi2)∼N((2−3),(1ρρ1)) and (β0,β1,β2)=(−1,2,1). Let εi∼U[−2,2] and assume that εi is independent of (Xi1,Xi2). Assume that the data is a random sample. Let (ˆβ0,ˆβ1,ˆβ2) be the OLS estimator of (β0,β1,β2).
For different values of n and ρ use simulations to calculate
- E(ˆβ1),
- Var(ˆβ1),
- the probability that β1 is in a 95% confidence interval,
- the expected length of the confidence interval,
- the probability that we reject H0:β1=2 using a two-sided t-test and a 1% significance level, and
- the probability that we reject H0:β1=1.8 using a two-sided t-test and a 1% significance level.
Discuss the results.
How do you think the results will change if Xi1 and εi are correlated?