Chapter 2 Method of Moments

2.1 Introduction

Method of moments estimation is based solely on the law of large numbers, which we repeat here:

Let $M_1, M_2,...$ be independent random variables having a common distribution possessing a mean $\mu_M$ . Then the sample means converge to the distributional mean as the number of observations increase.

$\bar{M}_n = \frac{1}{n}\sum_{i=1}^n M_i \rightarrow \mu_M \text{, as } n \rightarrow\infty$

To show how the method of moments determines an estimator, we first consider the case of one parameter. We start with independent random variables $X_1, X_2,...$ chosen according to the probability density $f_X(x|\theta)$ associated to an unknown parameter value $\theta$ . The common mean of the $X_i$ , $\mu_X$ , is a function $k(\theta)$ of $\theta$ . For example, if the $X_i$ are continuous random variables, then

$\mu_X = \int_{-\infty}^{\infty}xf_X(x|\theta) dx =k(\theta).$

The law of large numbers states that

$\bar{X}_n = \frac{1}{n}\sum_{i=1}^n X_i \rightarrow \mu_X \text{, as } n \rightarrow\infty$

Thus, if the number of observations $n$ is large, the distributional mean, $\mu= k(\theta)$ , should be well approximated by the sample mean, i.e.,

$\bar{X} \approx k(\theta)$

This can be turned into an estimator $\hat{\theta}$ by setting

$\bar{X} \approx k(\hat{\theta})$

and solving for $\hat{\theta}$ .

2.2 The Procedure

More generally, for independent random variables $X_1, X_2,...$ chosen according to the probability distribution derived from the parameter value $\theta$ and $m$ a real valued function, if $k(\theta) = E_{\theta}m(X_1)$ , then

$\frac{1}{n}\sum_{i=1}^n m(X_i) \rightarrow k(\theta) \text{, as } n \rightarrow\infty$

The method of moments results from the choices $m(x) = x^m$ . Write

$µ_m = EX^m = k_m(\theta).$

for the $m$ -th moment.

Our estimation procedure follows from these 4 steps to link the sample moments to parameter estimates.

Step 1. If the model has $d$ parameters, we compute the functions $k_m$ for the first $d$ moments,

$\mu_1 = k_1(\theta_1, \theta_2 ..., \theta_d), \mu_2 = k_2(\theta_1, \theta_2 ..., \theta_d), ..., \mu_d = k_d(\theta_1, \theta_2 ..., \theta_d),$

obtaining $d$ equations in $d$ unknowns.

Step 2. We then solve for the $d$ parameters as a function of the moments.

$\theta_1 = g_1(\mu_1, \mu_2, ··· , \mu_d), \theta_2 = g_2(\mu_1, \mu_2, ··· , \mu_d), ..., \theta_d = g_d(\mu_1, \mu_2, ··· , \mu_d)$

Step 3. Now, based on the data $x = (x_1, x_2,...,x_n)$ , we compute the first $d$ sample moments

$\bar{x} = \frac{1}{n}\sum_{i=1}^n x_i, \bar{x^2} = \frac{1}{n}\sum_{i=1}^n x_i^2, \dots, \bar{x^d} = \frac{1}{n}\sum_{i=1}^n x_i^d,$

Step 4. We replace the distributional moments $\mu_m$ by the sample moments $x_m$ , then the formulas for the method of moment estimators $(\hat{\theta}_1,\hat{\theta}_2,\dots,\hat{\theta}_d)$ . For the data $x$ , these estimates are

$\hat{\theta}_1(x)=g_1(\bar{x},\bar{x^2},\dots,\bar{x^d}),\hat{\theta}_2(x)=g_1(\bar{x},\bar{x^2},\dots,\bar{x^d}),\dots,\hat{\theta}_d(x)=g_d(\bar{x},\bar{x^2},\dots,\bar{x^d}).$

2.3 Example

Consider the uniform distribution on the interval $[a,b]$ , $U(a,b)$ . If $W\sim U(a,b)$ then we have

$\mu_1 = \mathbb{E}[W] = \frac{1}{2}(a+b) \\ \mu_2 = \mathbb{E}[W^2] = \frac{1}{3}(a^2 + ab +b^2)$

Solving these equations gives

$\hat{a} = \mu_1 - \sqrt{3(\mu_2-\mu_1^2)} \\ \hat{b} = \mu_1 + \sqrt{3(\mu_2-\mu_1^2)}$

Given a set of samples $w_1,w_2,\dots$ we can use the sample moments $\hat{\mu}_1$ and $\hat{\mu}_2$ in these formula in order to estimate $a$ and $b$ .