## 4.1 Method of moments

Consider a population $$X$$ whose distribution depends on $$K$$ unknown parameters $$\boldsymbol{\theta}=(\theta_1,\ldots,\theta_K)'.$$ If they exist, the population moments are, in general, functions of the unknown parameters $$\boldsymbol{\theta}.$$ That is,

\begin{align*} \alpha_r\equiv\alpha_r(\theta_1,\ldots,\theta_K):=\mathbb{E}[X^r], \quad r=1,2,\ldots \end{align*}

Given a srs of $$X,$$ we denote by $$a_r$$ to the sample moment of order $$r$$ that estimates $$\alpha_r$$:

\begin{align*} a_r:=\bar{X^r}=\frac{1}{n}\sum_{i=1}^n X_i^r, \quad r=1,2,\ldots \end{align*}

Note that the sample moments do not depend on $$\boldsymbol{\theta}=(\theta_1,\ldots,\theta_K)'$$ but the population moments do. This is the key fact that the method of moments exploits for finding the values of the parameters $$\boldsymbol{\theta}$$ that perfectly equate $$\alpha_r$$ to $$a_r$$ for as many $$r$$’s as necessary.42 The overall idea can be abstracted as matching population with sample moments and solving for $$\boldsymbol{\theta}$$.

Definition 4.1 (Method of moments) Let $$X\sim F_{\boldsymbol{\theta}}$$ with $$\boldsymbol{\theta}=(\theta_1,\ldots,\theta_K)'.$$ From a srs of $$X,$$ the method of moments produces the estimator $$\hat{\boldsymbol{\theta}}_{\mathrm{MM}}$$ that is the solution to the system of equations

\begin{align*} \alpha_r(\theta_1,\ldots,\theta_K)=a_r, \quad r=1,\ldots, R, \end{align*}

where $$R\geq K$$ is the lowest integer such that the system admits a unique solution and $$\theta_1,\ldots,\theta_K$$ are the variables. The estimator $$\hat{\boldsymbol{\theta}}_{\mathrm{MM}}$$ is simply referred to as the moment estimator of $$\boldsymbol{\theta}.$$

Example 4.1 Assume that we have a population with distribution $$\mathcal{N}(\mu,\sigma^2)$$ and a srs $$(X_1,\ldots,X_n)$$ from it. In this case, $$\boldsymbol{\theta}=(\mu,\sigma^2)'.$$ Let us compute the moment estimators of $$\mu$$ and $$\sigma^2.$$

For estimating two parameters, we need at least a system with two equations. We compute in the first place the first two moments of the rv $$X\sim \mathcal{N}(\mu,\sigma^2).$$ The first one is $$\alpha_1(\mu,\sigma^2)=\mathbb{E}[X]=\mu.$$ The second order moment arises from the variance $$\sigma^2$$:

\begin{align*} \alpha_2(\mu,\sigma^2)=\mathbb{E}[X^2]=\mathbb{V}\mathrm{ar}[X]+\mathbb{E}[X]^2=\sigma^2+\mu^2. \end{align*}

On the other hand, the first two sample moments are given by

\begin{align*} a_1=\bar{X}, \quad a_2=\frac{1}{n}\sum_{i=1}^n X_i^2=\bar{X^2}. \end{align*}

Then, the equations to solve in $$(\mu,\sigma^2)$$ are

\begin{align*} \begin{cases} \mu=\bar{X},\\ \sigma^2+\mu^2=\bar{X^2}. \end{cases} \end{align*}

The solution for $$\mu$$ is already in the first equation. Substituting this value in the second equation and solving $$\sigma^2,$$ we get the estimators

\begin{align*} \hat{\mu}_{\mathrm{MM}}=\bar{X},\quad \hat{\sigma}^2_{\mathrm{MM}}=\bar{X^2}-\hat{\mu}_{\mathrm{MM}}^2=\bar{X^2}-\bar{X}^2=S^2. \end{align*}

It turns out the sample mean and sample variance are the moment estimators of $$(\mu,\sigma^2).$$

Example 4.2 Let $$(X_1,\ldots,X_n)$$ be a srs of a rv $$X\sim\mathcal{U}(0,\theta).$$ Let us obtain the estimator of $$\theta$$ by the method of moments.

The first population moment is $$\alpha_1(\theta)=\mathbb{E}[X]=\theta/2$$ and the first sample moment is $$a_1=\bar{X}.$$ Equating both and solving for $$\theta,$$ we readily obtain $$\hat{\theta}_{\mathrm{MM}}=2\bar{X}.$$

We can observe that the estimator $$\hat{\theta}_{\mathrm{MM}}$$ of the upper range limit can be actually smaller than $$X_{(n)}.$$ Or larger than $$\theta,$$43 even if we do not have any possible information on the sample above $$\theta.$$ than the maximum observation. Then, intuitively, the estimator is clearly suboptimal. This observation is just an illustration of a more general fact that shows that the estimators obtained by the method of moments are usually not the most efficient ones.

Example 4.3 Let $$(X_1,\ldots,X_n)$$ be a srs of a rv $$X\sim\mathcal{U}(-\theta,\theta),$$ $$\theta>0.$$ Obtain the moment estimator of $$\theta.$$

The first population moment is now $$\alpha_1(\theta)=\mathbb{E}[X]=0.$$ It does not contain any information about $$\theta$$! Therefore, we need to look into the second population moment, that is $$\alpha_2(\theta)=\mathbb{E}[X^2]=\mathbb{V}\mathrm{ar}[X]+\mathbb{E}[X]^2=\mathbb{V}\mathrm{ar}[X]=\theta^2/3.$$ With this moment we can now solve $$\alpha_2(\theta)=\bar{X^2}$$ for $$\theta,$$ obtaining $$\hat{\theta}_{\mathrm{MM}}=\sqrt{3\bar{X^2}}.$$

This example illustrates that in certain situations it may be required to consider $$R>K$$ equations (here, $$R=2$$ and $$K=1$$) to compute the moment estimators if some of them are non-informative.

Example 4.4 We know from (2.7) that $$\mathbb{E}[\chi^2_\nu]=\nu.$$ Therefore, it is immediate that $$\hat{\nu}_{\mathrm{MM}}=\bar{X}.$$ Figure 4.2 shows a visualization of how the method of moments operates in this case: it “scans” several degrees of freedom $$\nu$$ until finding that for which $$\nu=\bar{X}.$$ In this process, the method of moments only uses the information of the sample realization $$x_1,\ldots,x_n$$ that is summarized in $$\bar{X},$$ nothing else.44

An important observation is that, if the parameters to be estimated $$\theta_1,\ldots,\theta_K$$ can be written as a function of $$K$$ population moments through continuous functions,

\begin{align*} \theta_k=g_k(\alpha_1,\ldots,\alpha_K), \quad k=1,\ldots,K, \end{align*}

then the estimator of $$\theta_k$$ by the method of moments simply follows by replacing $$\alpha$$’s by $$a$$’s:

\begin{align*} \hat{\theta}_{\mathrm{MM},k}=g_k(a_1,\ldots,a_K). \end{align*}

Recall that $$g_k$$ is such that $$\theta_k=g_k\left(\alpha_1(\theta_1,\ldots,\theta_K),\ldots,\alpha_K(\theta_1,\ldots,\theta_K)\right).$$ That is, $$g_k$$ is the $$k$$-th component of the inverse function of

\begin{align*} \alpha:(\theta_1,\ldots,\theta_K)\in\mathbb{R}^K\mapsto \left(\alpha_1(\theta_1,\ldots,\theta_K),\ldots,\alpha_K(\theta_1,\ldots,\theta_K)\right)\in\mathbb{R}^K. \end{align*}

Proposition 4.1 (Consistency in probability of the method of moments) Let $$(X_1,\ldots,X_n)$$ be a srs of a rv $$X\sim F_{\boldsymbol{\theta}}$$ with $$\boldsymbol{\theta}=(\theta_1,\ldots,\theta_K)'$$ that verifies

\begin{align} \mathbb{E}[(X^k-\alpha_k)^2]<\infty, \quad k=1,\ldots,K.\tag{4.1} \end{align}

If $$\theta_k=g_k(\alpha_1,\ldots,\alpha_K),$$ with $$g_k$$ continuous, then the moment estimator for $$\theta_k,$$ $$\hat{\theta}_{\mathrm{MM},k}=g_k(a_1,\ldots,a_K),$$ is consistent in probability:

\begin{align*} \hat{\theta}_{\mathrm{MM},k}\stackrel{\mathbb{P}}{\longrightarrow} \theta_k,\quad k=1,\ldots,K. \end{align*}

Proof (Proof of Proposition 4.1). Thanks to the condition (4.1), the LLN implies that the sample moments $$a_1,\ldots,a_K$$ are consistent in probability for estimating the population moments. In addition, the functions $$g_k$$ are continuous for all $$k=1,\ldots,K,$$ hence by Theorem 3.3, $$\hat{\theta}_k$$ is consistent in probability for $$\theta_k,$$ $$k=1,\ldots,K.$$

1. The principle of equating $$\alpha_r$$ to $$a_r$$ rests upon the fact that $$a_r\stackrel{\mathbb{P}}{\longrightarrow}\alpha_r$$ when $$n\to\infty$$ (see Corollary 3.2).↩︎

2. Assume $$\theta=1$$ and $$n=2$$ with $$(x_1,x_2)=(0.5,0.75).$$ Then, $$\hat{\theta}_{\mathrm{MM}}=1.25,$$ but we never observed a quantity above $$1.25$$ in the sample.↩︎

3. In particular, it does not use the form of the pdf.↩︎