Appendix
Useful probability inequalities
The following two probability inequalities are useful to show consistency in probability.
Proposition 3.2 (Markov's inequality) For any rv X,
P(|X|≥k)≤E[X2]k2,k>0.
Proof (Proof of Proposition 3.2). Assume that X is a continuous rv and let f be its pdf. We compute the second-order moment of X:
E[X2]=∫∞−∞x2f(x)dx=∫(−∞,−k]∪[k,∞)x2f(x)dx+∫(−k,k)x2f(x)dx≥∫(−∞,−k]∪[k,∞)x2f(x)dx≥k2∫(−∞,−k]∪[k,∞)f(x)dx=k2P(|X|≥k),
which is equivalent to
P(|X|≥k)≤E[X2]k2.
The proof for a discrete rv is analogous.
Proposition 3.3 (Chebyshev's inequality) Let X a rv with E[X]=μ and Var[X]=σ2<∞. Then,
P(|X−μ|≥kσ)≤1k2,k>0.
Proof (Proof of Proposition 3.3). This inequality follows from Markov’s inequality. Indeed, taking
X′=X−μ,k′=kσ,
and replacing X by X′ and k by k′ in Markov’s inequality, we get
P(|X−μ|≥kσ)≤σ2k2σ2=1k2.
Chebyshev’s inequality is useful for obtaining probability bounds about ˆθ when its probability distribution is unknown. We only need to know the expectation and variance of ˆθ. Indeed, taking k=2, the Chebyshev’s inequality gives
P(|ˆθ−θ|≤2σˆθ)≥1−14=0.75,
which means that at least the 75% of the realized values of ˆθ fall within the interval [θ−2σˆθ,θ+2σˆθ]. However, if we additionally know that the distribution of the estimator ˆθ is normal, ˆθ∼N(θ,σ2ˆθ), then we obtain the much more precise result
P(|ˆθ−θ|≤2σˆθ)≈0.95>0.75.
The fact that the true probability, ≈0.95, is in this case substantially larger than the lower bound given by Chebyshev’s inequality, 0.75, is reasonable: Chebyshev’s inequality does not employ any knowledge on the distribution of ˆθ. Thus, the precision increases as there is more information about the true distribution of ˆθ.
Example 3.36 From previous experience, it is known that the time X (in minutes) that a periodic check of a machine requires is distributed as Γ(3,2). A new worker spends 19 minutes checking that machine. Is this time coherent with the previous experience?
We know that the mean and the variance of a gamma are given by
μ=αβ=3×2=6,σ2=αβ2=3×22=12→σ≈3.46.
Then, the difference between the checking time of the new worker and μ is 19−6=13. To see whether this difference is large or small, or, in other words, to see whether the checking time of this new worker is in line with previous checking times, we would want to know the probability
P(|X−μ|≥13).
For that, we take kσ=13 in Chebyshev’s inequality, so k=13/σ=13/3.46=3.76, and applying the inequality we readily get
P(|X−μ|≥13)≤1k2=13.762=0.071.
Since this probability bound is very small, the checking time is not coherent with the previous experience. Two things may have happened: either the new worker has faced a more complicated inspection or he/she is slower than the rest.
Rao–Blackwell’s Theorem
Rao–Blackwell’s Theorem provides an effective form of reducing the variance of an unbiased estimator using a sufficient statistic T. This process is sometimes known as Rao–Blackwellization and results in a new estimator with lower MSE.
Theorem 3.9 (Rao–Blackwell’s Theorem) Let T be a sufficient statistic for θ. Let ˆθ be an unbiased estimator of θ. Then, the estimator
ˆθ′:=E[ˆθ|T]
verifies:
- ˆθ′ is independent of θ.
- E[ˆθ′]=θ, ∀θ∈Θ.
- Var[ˆθ′]≤Var[ˆθ], ∀θ∈Θ.
In addition, Var[ˆθ′]=Var[ˆθ] if and only if P(ˆθ′=ˆθ)=1, ∀θ∈Θ.
Observe that the new estimator ˆθ′ depends on the sample through the sufficient statistic T and, in particular, on the minimal sufficient statistic.
Example 3.37 Let (X1,…,Xn) be a srs of X∼Pois(λ), that is, with pmf
p(x;λ)=λxe−λx!,x=0,1,…, λ>0.
Consider the parameter θ=p(0;λ)=e−λ. Let us perform a Let us perform a Rao–Blackwellization.
First, we need an unbiased estimator, for example:
ˆθ={1if X1=0,0if X1≠0,
since
E[ˆθ]=1×P(X1=0)+0×P(X1≠0)=P(X1=0)=e−λ.
Now we need a sufficient estimator. Writing the pmf of the Poisson in the form of the exponential family, we obtain
p(x;λ)=e−λexlogλx!
and therefore T(X1,…,Xn)=∑ni=1Xi is sufficient for λ and also for e−λ.
Then, we can Rao–Blackwellize ˆθ:
ˆθ′:=E[ˆθ|T]=1×P(X1=0|n∑i=1Xi=t)+0×P(X1≠0|n∑i=1Xi=t)=P(X1=0,∑ni=1Xi=t)P(∑ni=1Xi=t)=P(X1=0)P(∑ni=2Xi=t)P(∑ni=1Xi=t).
Now, it holds that, if Xi∼Pois(λ), i=1,…,n, are independent, then (see Exercise 1.20)
n∑i=1Xi∼Pois(nλ).
Therefore,
ˆθ′=e−λ[(n−1)λ]te−(n−1)λ/t!(nλ)te−nλ/t!=(n−1n)t.
Then, we have obtained the estimator
ˆθ′=E[ˆθ|T]=(n−1n)T,
which is unbiased, and whose variance is smaller than the one of ˆθ. Indeed,
E[ˆθ′]=∞∑x=1(n−1n)xe−nλ(nλ)xx!=e−nλ∞∑x=0(n−1)xλxx!=e−nλe(n−1)λ=e−λ=θ.
Therefore, ˆθ′ is unbiased. We compute its variance. For that, in the first place, we compute
E[ˆθ′2]=∞∑t=0(n−1n)2te−nλ(nλ)t!=e−nλ∞∑t=0((n−1)2λn)1t!=e−nλe(n−1)2λ/n=e−2λ+λ/n.
Then, the variance is
Var[ˆθ′]=E[ˆθ′2]−E2[ˆθ′]=e−2λ+λ/n−e−2λ=e−2λ(eλ/n−1).
We calculate the variance of ˆθ for comparison:
E[ˆθ2]=1×P(X1=0)=e−λ.
As a consequence,
Var[ˆθ]=e−λ−e−2λ=e−λ(1−e−λ).
Therefore,
Var[ˆθ′]Var[ˆθ]=e−2λ(eλ/n−1)e−λ(1−e−λ)=eλ/n−1eλ−1<1,∀n≥1.