Chapter 2 Method of Moments
2.1 Introduction
Method of moments estimation is based solely on the law of large numbers, which we repeat here:
Let \(M_1, M_2,...\) be independent random variables having a common distribution possessing a mean \(\mu_M\). Then the sample means converge to the distributional mean as the number of observations increase.
\[ \bar{M}_n = \frac{1}{n}\sum_{i=1}^n M_i \rightarrow \mu_M \text{, as } n \rightarrow\infty \]
To show how the method of moments determines an estimator, we first consider the case of one parameter. We start with independent random variables \(X_1, X_2,...\) chosen according to the probability density \(f_X(x|\theta)\) associated to an unknown parameter value \(\theta\). The common mean of the \(X_i\), \(\mu_X\), is a function \(k(\theta)\) of \(\theta\). For example, if the \(X_i\) are continuous random variables, then
\[ \mu_X = \int_{-\infty}^{\infty}xf_X(x|\theta) dx =k(\theta). \]
The law of large numbers states that
\[ \bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \rightarrow \mu_X \text{, as } n \rightarrow\infty \]
Thus, if the number of observations \(n\) is large, the distributional mean, \(\mu= k(\theta)\), should be well approximated by the sample mean, i.e.,
\[ \bar{X} \approx k(\theta) \]
This can be turned into an estimator \(\hat{\theta}\) by setting
\[ \bar{X} \approx k(\hat{\theta}) \]
and solving for \(\hat{\theta}\).
2.2 The Procedure
More generally, for independent random variables \(X_1, X_2,...\) chosen according to the probability distribution derived from the parameter value \(\theta\) and \(m\) a real valued function, if \(k(\theta) = E_{\theta}m(X_1)\), then
\[ \frac{1}{n}\sum_{i=1}^n m(X_i) \rightarrow k(\theta) \text{, as } n \rightarrow\infty \]
The method of moments results from the choices \(m(x) = x^m\). Write
\[ µ_m = EX^m = k_m(\theta). \]
for the \(m\)-th moment.
Our estimation procedure follows from these 4 steps to link the sample moments to parameter estimates.
- Step 1. If the model has \(d\) parameters, we compute the functions \(k_m\) for the first \(d\) moments,
\[ \mu_1 = k_1(\theta_1, \theta_2 ..., \theta_d), \mu_2 = k_2(\theta_1, \theta_2 ..., \theta_d), ..., \mu_d = k_d(\theta_1, \theta_2 ..., \theta_d), \]
obtaining \(d\) equations in \(d\) unknowns.
- Step 2. We then solve for the \(d\) parameters as a function of the moments.
\[ \theta_1 = g_1(\mu_1, \mu_2, ··· , \mu_d), \theta_2 = g_2(\mu_1, \mu_2, ··· , \mu_d), ..., \theta_d = g_d(\mu_1, \mu_2, ··· , \mu_d) \]
- Step 3. Now, based on the data \(x = (x_1, x_2,...,x_n)\), we compute the first \(d\) sample moments
\[ \bar{x} = \frac{1}{n}\sum_{i=1}^n x_i, \bar{x^2} = \frac{1}{n}\sum_{i=1}^n x_i^2, \dots, \bar{x^d} = \frac{1}{n}\sum_{i=1}^n x_i^d, \]
- Step 4. We replace the distributional moments \(\mu_m\) by the sample moments \(x_m\), then the formulas for the method of moment estimators \((\hat{\theta}_1,\hat{\theta}_2,\dots,\hat{\theta}_d)\). For the data \(x\), these estimates are
\[ \hat{\theta}_1(x)=g_1(\bar{x},\bar{x^2},\dots,\bar{x^d}),\hat{\theta}_2(x)=g_1(\bar{x},\bar{x^2},\dots,\bar{x^d}),\dots,\hat{\theta}_d(x)=g_d(\bar{x},\bar{x^2},\dots,\bar{x^d}). \]
2.3 Example
Consider the uniform distribution on the interval \([a,b]\), \(U(a,b)\). If \(W\sim U(a,b)\) then we have
\[ \mu_1 = \mathbb{E}[W] = \frac{1}{2}(a+b) \\ \mu_2 = \mathbb{E}[W^2] = \frac{1}{3}(a^2 + ab +b^2) \]
Solving these equations gives
\[ \hat{a} = \mu_1 - \sqrt{3(\mu_2-\mu_1^2)} \\ \hat{b} = \mu_1 + \sqrt{3(\mu_2-\mu_1^2)} \]
Given a set of samples \(w_1,w_2,\dots\) we can use the sample moments \(\hat{\mu}_1\) and \(\hat{\mu}_2\) in these formula in order to estimate \(a\) and \(b\).