Chapter 3 Method of Moments
3.1 Motivation
We started working with basic estimators for parameters in Chapter 1 (sample mean, sample parameter). Method of Moments, or MoM for short, provides the first type of ‘Inference’ estimators that we will look at in this course. While these aren’t used often in practice because of their relative simplicity, they are a good tool to introduce more intricate estimation theory.
3.2 Method of Moments
Let’s discuss a new type of estimator. This is the first ‘new’ estimator learned in Inference, and, like a lot of the concepts in the book, really relies on a solid understanding of the jargon from the first chapter to nail down.
Recall from probability theory hat the moments of a distribution are given by:
μk=E(Xk)μk=E(Xk)
Where μkμk is just our notation for the kthkth moment. So, the first moment, or μμ, is just E(X)E(X), as we know, and the second moment, or μ2μ2, is E(X2)E(X2). Recall that we could make use of MGFs (moment generating functions) to summarize these moments; don’t worry, we won’t really deal with MGFs much here.
Instead, we are all about inference here; when we see the kthkth moment of a distribution, we should think “how could I estimate that?” For example, μ3μ3 is just the average value of an individual observation cubed, just like μμ is the average value of an individual observation. How would we then estimate these values?
In inference, we’re going to use something called sample moments. The kthkth sample moment is defined as:
ˆμk=1nn∑i=1Xki^μk=1nn∑i=1Xki
Where ˆμk^μk is the kthkth sample moment (remember, we put a hat on things when we mean they are estimating something else. Here, ˆμk^μk is estimating the same thing without a hat: μkμk, or the kthkth moment).
Let’s break this down. This is basically saying that if we want μkμk, or E(Xk)E(Xk) (they are the same thing), just take a sample of nn people, raise each of their values to the kk, add them up and divide by the number of individuals in the sample (nn). If you wanted to estimate the fourth moment for the weight of college males, you would take a sample of some college males, raise each of their weights to the power of 4 and divide by the number of people you sampled.
Does this make sense? Well, consider the case where k=1k=1, or μμ. This is, of course, just the mean of a distribution. The sample moment for this first moment is given by:
1nn∑i=1Xi1nn∑i=1Xi
Which we got by just plugging in k=1k=1 to the above formula for sample moments. Hey-o, that’s the sample mean, or what we’ve long-established is the natural estimator for the true mean! You can see how this builds, then, as we get to higher and higher moments.
Anyways, the takeaway here is that we use sample moments to estimate the actual moments of a distribution. That’s great, and we would be finished if we were asking you to estimate moments of a distribution. However, you’re rarely asked to estimate actual moments; instead, as you’ve seen, you’re generally asked to estimate parameters of a distribution.
So, we know that we can estimate moments with sample moments, and we know that we want to estimate parameters. How can we use these two facts to get what we want; a solid estimate for the parameters? Well, if we can write the parameters of a distribution in terms of that distribution’s moments, and then simply estimate those moments in terms of the sample moments, then we have created an estimator for the parameter in terms of the sample moment.
Whoa…that’s a little crazy, and probably too much of a mouthful right now. Let’s try and learn this with a solid example of the most famous statistical distribution: the Normal.
If we’re doing estimation for a Normal, that means that we believe the underlying model for some real world data is Normal. For example, we might believe that eyelash length for men in Massachusetts is normally distributed. If we want to carry out inference, we have to estimate the parameters; here, the parameters of a Normal distribution are the mean and the variance. So, for this inferential exercise, we have to estimate the mean and the variance.
We already know, from what we learned earlier, that we have natural estimates for the moments of the Normal distribution. What we’re going to do, then, is try and write the moments of a Normal in terms of the parameters (the mean and variance). We only need to write out the first two moments, E(X)E(X) and E(X2)E(X2), since we have to parameters (in general, if you have kk parameters that you want to estimate, you need to write out kk moments).
Let’s go ahead and do that. How do we write E(X)E(X) in terms of μμ and σ2σ2? Well, you know quite well that E(X)E(X) is just μμ, since they are both the mean for a Normal distribution. What about writing E(X2)E(X2) in terms of μμ and σσ? Well, this takes a little bit more cleverness. Recall that Var(X)=E(X2)−E(X)2Var(X)=E(X2)−E(X)2. Re-writing this yields Var(X)+E(X)2=E(X2)Var(X)+E(X)2=E(X2). We know that, for a Normal distribution, Var(X)=σ2Var(X)=σ2, and E(X)2=μ2E(X)2=μ2. So, we can write E(X2)=μ2+σ2E(X2)=μ2+σ2, and this yields the system of equations:
μ1=μμ1=μ μ2=σ2+μ2μ2=σ2+μ2
Where μ1μ1 is notation for the first moment, μ2μ2 notation for the second moment, etc.
Well now, we’ve written our moments in terms of the parameters that we’re trying to estimate. We know that we have good estimators (the sample moments) for our moments μ1μ1 and μ2μ2, so let’s try and solve this system of equations for the parameters in terms of the moments. Well, we now that μ=μ1μ=μ1, so we can plug in μ1μ1 for μμ in the second equation and then solve for σ2σ2. We get that:
μ2=σ2+μ2→σ2=μ2−μ21μ2=σ2+μ2→σ2=μ2−μ21
So now our two equations for the parameters in terms of the moments are:
μ=μ1μ=μ1 σ2=μ2−μ21σ2=μ2−μ21
That is, the first parameter, the mean μμ, is equal to the first moment of the distribution, and the second parameter, the variance σ2σ2, is equal to the second moment of the distribution minus the first moment of the distribution squared.
Why did we go through all of that work? Well, recall the ultimate goal of all of this: to estimate the parameters of a distribution. Recall also that we know how to estimate the moments of a distribution; with the sample moments! That is, a good estimate for μkμk is 1n∑ni=1Xki1n∑ni=1Xki. So, now that we know the parameters in terms of the moments, estimating the parameters is the same as estimating the moments. We can plug in our estimates for the moments and get good estimates for the parameters μμ and σ2σ2!
So, the sample moment for μ1μ1, by formula, is just 1n∑ni=1Xi1n∑ni=1Xi, and the sample moment for μ2μ2 is, again by formula, 1n∑ni=1X2i1n∑ni=1X2i. Plugging these in for μ1μ1 and μ2μ2 yields:
ˆμ=1nn∑i=1Xi^μ=1nn∑i=1Xi ^σ2=1nn∑i=1X2i−(1nn∑i=1Xi)2^σ2=1nn∑i=1X2i−(1nn∑i=1Xi)2
Where ˆμ^μ and ^σ2 are just estimates for the mean and variance, respectively (remember, we put hats on things to indicate that it’s an estimator). We can test this in R by generating data from, say, a N(5,9) distribution and using the above MoM estimators to see if we can get close to the original parameters (mean of 5 and variance of 9).
# replicate
set.seed(0)
<- 20
n <- 5
mu <- 3
sigma
# generate
<- replicate(rnorm(n, mu, sigma), n = 100)
samples
# calculate estimates
<- apply(samples, 2, mean)
sample_means <- apply(samples, 2, function(x) {
sample_var return((1 / n) * sum(x ^ 2) - sum(x / n) ^ 2)
})
# check if estimates are close:
mean(sample_means); mean(sample_var)
## [1] 4.939076
## [1] 8.958363
Looks like we got back to the original parameters.
This is magical! We’ve just written great estimators for our parameters using just data that we’ve sampled. Hopefully you followed what we did here…if not, here’s a checklist that summarizes the process:
Write the moments of the distribution in terms of the parameters (if you have k parameters, you will have to write out k moments).
Solve for the parameters in terms of the moments.
Plug in the sample moments for the moments.
Go back and make sure you can follow these steps for what we did here with the Normal. A quick caveat: you may have noticed that we could immediately written the second parameter, σ2, in terms of the first and second moments because we know Var(X)=E(X2)−E(X)2. Yes, we did do an extra step here by first writing it backwards and then solving it, but that extra step will come in handy in more advanced situations, so do be sure to follow it in general. Also, although that estimator the second parameter looks ugly, it simplifies nicely to (n−1n)s2, where s2 is the sample variance. If you want to try and show this on your own, recall that the sample standard deviation is given by:
s2=1n−1n∑i=1(xi−ˉx)2
So, why do we like MoM estimators? Well, generally, they’re pretty easy to find. They’re also generally consistent, a term which means that the estimator converges to the true value of the parameter as the sample gets larger and larger (this is the qualitative definition only). In short, they are easy and pretty good estimators!
If the method of moments still isn’t clicking, here is a video review of these estimators:
Click here to watch this video in your browser
And, finally, it might be helpful to try this calculation with a couple of other distributions…
3.3 Practice
Let X∼Gamma(a,λ). Find the MOM estimators for a and λ.
Solution: This is a classic MoM question. We have two parameters, so we need two moments. We write:
μ1=aλ μ2=aλ2+a2λ2
And we solve for a and λ in terms of μ1 and μ2. We know that a=λμ1 from the first equation, so we can plug this in to the second equation:
μ2=λμ1λ2+λ2μ21λ2
μ2=μ1λ+μ21 μ2−μ21=μ1λ λ=μ1μ2−μ21
And plugging this in to a=λμ1, we get:
a=μ21μ2−μ21
We plug in the sample moments to estimate a and λ. Our estimator for μ1 is ^μ1=1n∑ni=1Xi, and our estimator for μ2 is ^μ2=1n∑ni=1X2i. So:
ˆaMOM=ˆμ21ˆμ2−ˆμ21
ˆλMOM=ˆμ1ˆμ2−ˆμ21
We can see how our estimates do by running some simple R code for a Gamma(5,7) distribution.
# replicate
set.seed(0)
<- rgamma(1000, 5, 7)
x
# calculate MoM estimates
<- mean(x)
mu_1 <- mean(x ^ 2)
mu_2
^ 2 / (mu_2 - mu_1 ^ 2) mu_1
## [1] 4.885506
/ (mu_2 - mu_1 ^ 2) mu_1
## [1] 7.01947
It looks like our MoM estimators get close to the original parameters of 5 and 7.