3.6 Efficient estimators
Definition 3.12 (Fisher information) Let X∼f(⋅;θ) be a continuous rv with θ∈Θ⊂R, and such that θ↦f(x;θ) is differentiable for all θ∈Θ and x∈supp(f):={x∈R:f(x;θ)>0} (supp(f) is the support of the pdf). The Fisher information of X about θ is defined as
I(θ):=E[(∂logf(X;θ)∂θ)2].
When X is discrete, the Fisher information is defined analogously by just replacing the pdf f(⋅;θ) by the pmf p(⋅;θ).
Observe that the quantity
(∂logf(x;θ)∂θ)2=(1f(x;θ)∂f(x;θ)∂θ)2
is the square of the weighted rate of variation of θ↦f(x;θ) for infinitesimal variations of θ, for the realization x of the rv X. The square is meant to remove the sign from the rate of variation. Therefore, (3.8) can be interpreted as the information contained in x for discriminating the parameter θ from close values θ+δ. For example, if (3.8) is close to zero for θ=θ0, then it means that θ↦f(x;θ) is almost flat about θ=θ0, so f(x;θ0) and f(x;θ0+δ) are very similar. This means that the sample realization X=x is not informative on whether the underlying parameter θ is θ0 or θ0+δ because both have almost the same likelihood.



Figure 3.4: Fisher information integrand λ↦(∂logf(x;λ)/∂λ)2 in a Pois(λ0) distribution with λ0=2,5,10. The integrands are shown in different colors, with the color transparency indicating the probability of x according to Pois(λ0) (the darker the color, the higher the probability). The Fisher information curve λ↦I(λ) is shown in black, with a black point signaling the value I(λ0). The colored points indicate the contribution of each x to the Fisher information.
When taking the expectation in (3.7), we obtain that the Fisher information is the expected information of X that is useful to distinguish θ from close values. This quantity is related with the precision (i.e., the inverse of the variance) of an unbiased estimator of θ.
Example 3.30 Compute the Fisher’s information of a rv X∼Pois(λ).
The Poisson’s pmf is given by
p(x;λ)=λxe−λx!,x=0,1,2,…,
so its logarithm is
logp(x;λ)=xlogλ−λ−log(x!)
and its derivative with respect to λ is
∂logp(x;λ)∂λ=xλ−1.
The Fisher information is then obtained taking the expectation of39
(∂logp(X;λ)∂λ)2=(X−λλ)2,
Noting that E[X]=λ and, therefore, E[(X−λ)2]=Var[X]=λ, we obtain
I(λ)=E[(X−λλ)2]=Var[X]λ2=1λ.
The following fundamental result states the lower variance bound of an unbiased estimator that is regular enough. Hence, it establishes the lowest MSE that an unbiased estimator can attain and thus establishes a reference for other unbiased estimators. The inequality is also known as the information inequality.
Theorem 3.8 (Crámer–Rao’s inequality) Let (X1,…,Xn) be a srs of a rv with pdf f(x;θ), and let In(θ):=nI(θ) be the Fisher’s information of the sample (X1,…,Xn) about θ. If ˆθ≡ˆθ(X1,…,Xn) is an unbiased estimator of θ then, under certain general conditions,40 it holds
Var[ˆθ]≥1In(θ).
Definition 3.13 (Efficient estimator) An unbiased estimator ˆθ of θ that verifies Var[ˆθ]=(In(θ))−1 is said to be efficient.
Example 3.31 Show that for a rv Pois(λ) the estimator ˆλ=ˉX is efficient.
We first calculate the Fisher information In(θ) of the sample (X1,…,Xn). Since I(λ)=1/λ from Example 3.30, then
In(λ)=nI(λ)=nλ.
On the other hand, the variance of ˆλ=ˉX is
Var[ˆλ]=1n2Var[n∑i=1Xi]=nλn2=λn.
Therefore, ˆλ=ˉX is efficient.
Example 3.32 Check that ˆθ=ˉX is efficient for estimating θ in a population Exp(1/θ).