Chapter 1 Method of Moments

1.1 Introduction

Method of moments estimation is based solely on the law of large numbers, which we repeat here:

Let $$M_1, M_2,...$$ be independent random variables having a common distribution possessing a mean $$\mu_M$$. Then the sample means converge to the distributional mean as the number of observations increase.

$\bar{M}_n = \frac{1}{n}\sum_{i=1}^n M_i \rightarrow \mu_M \text{, as } n \rightarrow\infty$

To show how the method of moments determines an estimator, we first consider the case of one parameter. We start with independent random variables $$X_1, X_2,...$$ chosen according to the probability density $$f_X(x|\theta)$$ associated to an unknown parameter value $$\theta$$. The common mean of the $$X_i$$, $$\mu_X$$, is a function $$k(\theta)$$ of $$\theta$$. For example, if the $$X_i$$ are continuous random variables, then

$\mu_X = \int_{-\infty}^{\infty}xf_X(x|\theta) dx =k(\theta).$

The law of large numbers states that

$\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \rightarrow \mu_X \text{, as } n \rightarrow\infty$

Thus, if the number of observations $$n$$ is large, the distributional mean, $$\mu= k(\theta)$$, should be well approximated by the sample mean, i.e.,

$\bar{X} \approx k(\theta)$

This can be turned into an estimator $$\hat{\theta}$$ by setting

$\bar{X} \approx k(\hat{\theta})$

and solving for $$\hat{\theta}$$.

1.2 The Procedure

More generally, for independent random variables $$X_1, X_2,...$$ chosen according to the probability distribution derived from the parameter value $$\theta$$ and $$m$$ a real valued function, if $$k(\theta) = E_{\theta}m(X_1)$$, then

$\frac{1}{n}\sum_{i=1}^n m(X_i) \rightarrow k(\theta) \text{, as } n \rightarrow\infty$

The method of moments results from the choices $$m(x) = x^m$$. Write

$µ_m = EX^m = k_m(\theta).$

for the $$m$$-th moment.

Our estimation procedure follows from these 4 steps to link the sample moments to parameter estimates.

• Step 1. If the model has $$d$$ parameters, we compute the functions $$k_m$$ for the first $$d$$ moments,

$\mu_1 = k_1(\theta_1, \theta_2 ..., \theta_d), \mu_2 = k_2(\theta_1, \theta_2 ..., \theta_d), ..., \mu_d = k_d(\theta_1, \theta_2 ..., \theta_d),$

obtaining $$d$$ equations in $$d$$ unknowns.

• Step 2. We then solve for the $$d$$ parameters as a function of the moments.

$\theta_1 = g_1(\mu_1, \mu_2, ··· , \mu_d), \theta_2 = g_2(\mu_1, \mu_2, ··· , \mu_d), ..., \theta_d = g_d(\mu_1, \mu_2, ··· , \mu_d)$

• Step 3. Now, based on the data $$x = (x_1, x_2,...,x_n)$$, we compute the first $$d$$ sample moments

$\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i, \bar{x^2} = \frac{1}{n}\sum_{i=1}^n x_i^2, \dots, \bar{x^d} = \frac{1}{n}\sum_{i=1}^n x_i^d,$

• Step 4. We replace the distributional moments $$\mu_m$$ by the sample moments $$x_m$$, then the formulas for the method of moment estimators $$(\hat{\theta}_1,\hat{\theta}_2,\dots,\hat{\theta}_d)$$. For the data $$x$$, these estimates are

$\hat{\theta}_1(x)=g_1(\bar{x},\bar{x^2},\dots,\bar{x^d}),\hat{\theta}_2(x)=g_1(\bar{x},\bar{x^2},\dots,\bar{x^d}),\dots,\hat{\theta}_d(x)=g_d(\bar{x},\bar{x^2},\dots,\bar{x^d}).$

1.3 Example

Consider the uniform distribution on the interval $$[a,b]$$, $$U(a,b)$$. If $$W\sim U(a,b)$$ then we have

$\mu_1 = \mathbb{E}[W] = \frac{1}{2}(a+b) \\ \mu_2 = \mathbb{E}[W^2] = \frac{1}{3}(a^2 + ab +b^2)$

Solving these equations gives

$\hat{a} = \mu_1 - \sqrt{3(\mu_2-\mu_1^2)} \\ \hat{b} = \mu_1 + \sqrt{3(\mu_2-\mu_1^2)}$

Given a set of samples $$w_1,w_2,\dots$$ we can use the sample moments $$\hat{\mu}_1$$ and $$\hat{\mu}_2$$ in these formula in order to estimate $$a$$ and $$b$$.