4.1 Method of moments

Consider a population X whose distribution depends on K unknown parameters \boldsymbol{\theta}=(\theta_1,\ldots,\theta_K)'. If they exist, the population moments are, in general, functions of the unknown parameters \boldsymbol{\theta}. That is,

\begin{align*} \alpha_r\equiv\alpha_r(\theta_1,\ldots,\theta_K):=\mathbb{E}[X^r], \quad r=1,2,\ldots \end{align*}

Given a srs of X, we denote by a_r to the sample moment of order r that estimates \alpha_r:

\begin{align*} a_r:=\bar{X^r}=\frac{1}{n}\sum_{i=1}^n X_i^r, \quad r=1,2,\ldots \end{align*}

Note that the sample moments do not depend on \boldsymbol{\theta}=(\theta_1,\ldots,\theta_K)' but the population moments do. This is the key fact that the method of moments exploits for finding the values of the parameters \boldsymbol{\theta} that perfectly equate \alpha_r to a_r for as many r’s as necessary.42 The overall idea can be abstracted as matching population with sample moments and solving for \boldsymbol{\theta}.

Definition 4.1 (Method of moments) Let X\sim F_{\boldsymbol{\theta}} with \boldsymbol{\theta}=(\theta_1,\ldots,\theta_K)'. From a srs of X, the method of moments produces the estimator \hat{\boldsymbol{\theta}}_{\mathrm{MM}} that is the solution to the system of equations

\begin{align*} \alpha_r(\theta_1,\ldots,\theta_K)=a_r, \quad r=1,\ldots, R, \end{align*}

where R\geq K is the lowest integer such that the system admits a unique solution and \theta_1,\ldots,\theta_K are the variables. The estimator \hat{\boldsymbol{\theta}}_{\mathrm{MM}} is simply referred to as the moment estimator of \boldsymbol{\theta}.

Example 4.1 Assume that we have a population with distribution \mathcal{N}(\mu,\sigma^2) and a srs (X_1,\ldots,X_n) from it. In this case, \boldsymbol{\theta}=(\mu,\sigma^2)'. Let us compute the moment estimators of \mu and \sigma^2.

For estimating two parameters, we need at least a system with two equations. We compute in the first place the first two moments of the rv X\sim \mathcal{N}(\mu,\sigma^2). The first one is \alpha_1(\mu,\sigma^2)=\mathbb{E}[X]=\mu. The second order moment arises from the variance \sigma^2:

\begin{align*} \alpha_2(\mu,\sigma^2)=\mathbb{E}[X^2]=\mathbb{V}\mathrm{ar}[X]+\mathbb{E}[X]^2=\sigma^2+\mu^2. \end{align*}

On the other hand, the first two sample moments are given by

\begin{align*} a_1=\bar{X}, \quad a_2=\frac{1}{n}\sum_{i=1}^n X_i^2=\bar{X^2}. \end{align*}

Then, the equations to solve in (\mu,\sigma^2) are

\begin{align*} \begin{cases} \mu=\bar{X},\\ \sigma^2+\mu^2=\bar{X^2}. \end{cases} \end{align*}

The solution for \mu is already in the first equation. Substituting this value in the second equation and solving \sigma^2, we get the estimators

\begin{align*} \hat{\mu}_{\mathrm{MM}}=\bar{X},\quad \hat{\sigma}^2_{\mathrm{MM}}=\bar{X^2}-\hat{\mu}_{\mathrm{MM}}^2=\bar{X^2}-\bar{X}^2=S^2. \end{align*}

It turns out the sample mean and sample variance are the moment estimators of (\mu,\sigma^2).

Example 4.2 Let (X_1,\ldots,X_n) be a srs of a rv X\sim\mathcal{U}(0,\theta). Let us obtain the estimator of \theta by the method of moments.

The first population moment is \alpha_1(\theta)=\mathbb{E}[X]=\theta/2 and the first sample moment is a_1=\bar{X}. Equating both and solving for \theta, we readily obtain \hat{\theta}_{\mathrm{MM}}=2\bar{X}.

We can observe that the estimator \hat{\theta}_{\mathrm{MM}} of the upper range limit can be actually smaller than X_{(n)}. Or larger than \theta,43 even if we do not have any possible information on the sample above \theta. than the maximum observation. Then, intuitively, the estimator is clearly suboptimal. This observation is just an illustration of a more general fact that shows that the estimators obtained by the method of moments are usually not the most efficient ones.

Example 4.3 Let (X_1,\ldots,X_n) be a srs of a rv X\sim\mathcal{U}(-\theta,\theta), \theta>0. Obtain the moment estimator of \theta.

The first population moment is now \alpha_1(\theta)=\mathbb{E}[X]=0. It does not contain any information about \theta! Therefore, we need to look into the second population moment, that is \alpha_2(\theta)=\mathbb{E}[X^2]=\mathbb{V}\mathrm{ar}[X]+\mathbb{E}[X]^2=\mathbb{V}\mathrm{ar}[X]=\theta^2/3. With this moment we can now solve \alpha_2(\theta)=\bar{X^2} for \theta, obtaining \hat{\theta}_{\mathrm{MM}}=\sqrt{3\bar{X^2}}.

This example illustrates that in certain situations it may be required to consider R>K equations (here, R=2 and K=1) to compute the moment estimators if some of them are non-informative.

Example 4.4 We know from (2.7) that \mathbb{E}[\chi^2_\nu]=\nu. Therefore, it is immediate that \hat{\nu}_{\mathrm{MM}}=\bar{X}. Figure 4.2 shows a visualization of how the method of moments operates in this case: it “scans” several degrees of freedom \nu until finding that for which \nu=\bar{X}. In this process, the method of moments only uses the information of the sample realization x_1,\ldots,x_n that is summarized in \bar{X}, nothing else.44

\(\chi^2_\nu\) densities for several degrees of freedom \(\nu.\) Their color varies according to how far away the expectation \(\nu\) (shown in the vertical dashed lines) is from \(\bar{X}\approx5.5\) (red vertical line): the more yellowish, the closer \(\nu\) is to \(\bar{X}\).

Figure 4.2: \chi^2_\nu densities for several degrees of freedom \nu. Their color varies according to how far away the expectation \nu (shown in the vertical dashed lines) is from \bar{X}\approx5.5 (red vertical line): the more yellowish, the closer \nu is to \bar{X}.

An important observation is that, if the parameters to be estimated \theta_1,\ldots,\theta_K can be written as a function of K population moments through continuous functions,

\begin{align*} \theta_k=g_k(\alpha_1,\ldots,\alpha_K), \quad k=1,\ldots,K, \end{align*}

then the estimator of \theta_k by the method of moments simply follows by replacing \alpha’s by a’s:

\begin{align*} \hat{\theta}_{\mathrm{MM},k}=g_k(a_1,\ldots,a_K). \end{align*}

Recall that g_k is such that \theta_k=g_k\left(\alpha_1(\theta_1,\ldots,\theta_K),\ldots,\alpha_K(\theta_1,\ldots,\theta_K)\right). That is, g_k is the k-th component of the inverse function of

\begin{align*} \alpha:(\theta_1,\ldots,\theta_K)\in\mathbb{R}^K\mapsto \left(\alpha_1(\theta_1,\ldots,\theta_K),\ldots,\alpha_K(\theta_1,\ldots,\theta_K)\right)\in\mathbb{R}^K. \end{align*}

Proposition 4.1 (Consistency in probability of the method of moments) Let (X_1,\ldots,X_n) be a srs of a rv X\sim F_{\boldsymbol{\theta}} with \boldsymbol{\theta}=(\theta_1,\ldots,\theta_K)' that verifies

\begin{align} \mathbb{E}[(X^k-\alpha_k)^2]<\infty, \quad k=1,\ldots,K.\tag{4.1} \end{align}

If \theta_k=g_k(\alpha_1,\ldots,\alpha_K), with g_k continuous, then the moment estimator for \theta_k, \hat{\theta}_{\mathrm{MM},k}=g_k(a_1,\ldots,a_K), is consistent in probability:

\begin{align*} \hat{\theta}_{\mathrm{MM},k}\stackrel{\mathbb{P}}{\longrightarrow} \theta_k,\quad k=1,\ldots,K. \end{align*}

Proof (Proof of Proposition 4.1). Thanks to the condition (4.1), the LLN implies that the sample moments a_1,\ldots,a_K are consistent in probability for estimating the population moments. In addition, the functions g_k are continuous for all k=1,\ldots,K, hence by Theorem 3.3, \hat{\theta}_k is consistent in probability for \theta_k, k=1,\ldots,K.


  1. The principle of equating \alpha_r to a_r rests upon the fact that a_r\stackrel{\mathbb{P}}{\longrightarrow}\alpha_r when n\to\infty (see Corollary 3.2).↩︎

  2. Assume \theta=1 and n=2 with (x_1,x_2)=(0.5,0.75). Then, \hat{\theta}_{\mathrm{MM}}=1.25, but we never observed a quantity above 1.25 in the sample.↩︎

  3. In particular, it does not use the form of the pdf.↩︎