3.6 Efficient estimators

Definition 3.12 (Fisher information) Let Xf(;θ) be a continuous rv with θΘR, and such that θf(x;θ) is differentiable for all θΘ and xsupp(f):={xR:f(x;θ)>0} (supp(f) is the support of the pdf). The Fisher information of X about θ is defined as

I(θ):=E[(logf(X;θ)θ)2].

When X is discrete, the Fisher information is defined analogously by just replacing the pdf f(;θ) by the pmf p(;θ).

Observe that the quantity

(logf(x;θ)θ)2=(1f(x;θ)f(x;θ)θ)2

is the square of the weighted rate of variation of θf(x;θ) for infinitesimal variations of θ, for the realization x of the rv X. The square is meant to remove the sign from the rate of variation. Therefore, (3.8) can be interpreted as the information contained in x for discriminating the parameter θ from close values θ+δ. For example, if (3.8) is close to zero for θ=θ0, then it means that θf(x;θ) is almost flat about θ=θ0, so f(x;θ0) and f(x;θ0+δ) are very similar. This means that the sample realization X=x is not informative on whether the underlying parameter θ is θ0 or θ0+δ because both have almost the same likelihood.

Fisher information integrand \(\lambda\mapsto \left(\partial \log f(x;\lambda)/\partial \lambda\right)^2\) in a \(\mathrm{Pois}(\lambda_0)\) distribution with \(\lambda_0=2,5,10.\) The integrands are shown in different colors, with the color transparency indicating the probability of \(x\) according to \(\mathrm{Pois}(\lambda_0)\) (the darker the color, the higher the probability). The Fisher information curve \(\lambda\mapsto \mathcal{I}(\lambda)\) is shown in black, with a black point signaling the value \(\mathcal{I}(\lambda_0).\) The colored points indicate the contribution of each \(x\) to the Fisher information.Fisher information integrand \(\lambda\mapsto \left(\partial \log f(x;\lambda)/\partial \lambda\right)^2\) in a \(\mathrm{Pois}(\lambda_0)\) distribution with \(\lambda_0=2,5,10.\) The integrands are shown in different colors, with the color transparency indicating the probability of \(x\) according to \(\mathrm{Pois}(\lambda_0)\) (the darker the color, the higher the probability). The Fisher information curve \(\lambda\mapsto \mathcal{I}(\lambda)\) is shown in black, with a black point signaling the value \(\mathcal{I}(\lambda_0).\) The colored points indicate the contribution of each \(x\) to the Fisher information.Fisher information integrand \(\lambda\mapsto \left(\partial \log f(x;\lambda)/\partial \lambda\right)^2\) in a \(\mathrm{Pois}(\lambda_0)\) distribution with \(\lambda_0=2,5,10.\) The integrands are shown in different colors, with the color transparency indicating the probability of \(x\) according to \(\mathrm{Pois}(\lambda_0)\) (the darker the color, the higher the probability). The Fisher information curve \(\lambda\mapsto \mathcal{I}(\lambda)\) is shown in black, with a black point signaling the value \(\mathcal{I}(\lambda_0).\) The colored points indicate the contribution of each \(x\) to the Fisher information.

Figure 3.4: Fisher information integrand λ(logf(x;λ)/λ)2 in a Pois(λ0) distribution with λ0=2,5,10. The integrands are shown in different colors, with the color transparency indicating the probability of x according to Pois(λ0) (the darker the color, the higher the probability). The Fisher information curve λI(λ) is shown in black, with a black point signaling the value I(λ0). The colored points indicate the contribution of each x to the Fisher information.

When taking the expectation in (3.7), we obtain that the Fisher information is the expected information of X that is useful to distinguish θ from close values. This quantity is related with the precision (i.e., the inverse of the variance) of an unbiased estimator of θ.

Example 3.30 Compute the Fisher’s information of a rv XPois(λ).

The Poisson’s pmf is given by

p(x;λ)=λxeλx!,x=0,1,2,,

so its logarithm is

logp(x;λ)=xlogλλlog(x!)

and its derivative with respect to λ is

logp(x;λ)λ=xλ1.

The Fisher information is then obtained taking the expectation of39

(logp(X;λ)λ)2=(Xλλ)2,

Noting that E[X]=λ and, therefore, E[(Xλ)2]=Var[X]=λ, we obtain

I(λ)=E[(Xλλ)2]=Var[X]λ2=1λ.

The following fundamental result states the lower variance bound of an unbiased estimator that is regular enough. Hence, it establishes the lowest MSE that an unbiased estimator can attain and thus establishes a reference for other unbiased estimators. The inequality is also known as the information inequality.

Theorem 3.8 (Crámer–Rao’s inequality) Let (X1,,Xn) be a srs of a rv with pdf f(x;θ), and let In(θ):=nI(θ) be the Fisher’s information of the sample (X1,,Xn) about θ. If ˆθˆθ(X1,,Xn) is an unbiased estimator of θ then, under certain general conditions,40 it holds

Var[ˆθ]1In(θ).

Definition 3.13 (Efficient estimator) An unbiased estimator ˆθ of θ that verifies Var[ˆθ]=(In(θ))1 is said to be efficient.

Example 3.31 Show that for a rv Pois(λ) the estimator ˆλ=ˉX is efficient.

We first calculate the Fisher information In(θ) of the sample (X1,,Xn). Since I(λ)=1/λ from Example 3.30, then

In(λ)=nI(λ)=nλ.

On the other hand, the variance of ˆλ=ˉX is

Var[ˆλ]=1n2Var[ni=1Xi]=nλn2=λn.

Therefore, ˆλ=ˉX is efficient.

Example 3.32 Check that ˆθ=ˉX is efficient for estimating θ in a population Exp(1/θ).

References

Lehmann, E. L., and G. Casella. 1998. Theory of Point Estimation. Second. Springer Texts in Statistics. New York: Springer-Verlag. https://doi.org/10.1007/b98854.

  1. Note that we change x to X because now we have to work with a random variable to take an expectation.↩︎

  2. See Theorem 5.10 (Section 2.5, page 120) in Lehmann and Casella (1998).↩︎