Appendix

Useful probability inequalities

The following two probability inequalities are useful to show consistency in probability.

Proposition 3.2 (Markov's inequality) For any rv X,

P(|X|k)E[X2]k2,k>0.

Proof (Proof of Proposition 3.2). Assume that X is a continuous rv and let f be its pdf. We compute the second-order moment of X:

E[X2]=x2f(x)dx=(,k][k,)x2f(x)dx+(k,k)x2f(x)dx(,k][k,)x2f(x)dxk2(,k][k,)f(x)dx=k2P(|X|k),

which is equivalent to

P(|X|k)E[X2]k2.

The proof for a discrete rv is analogous.

Proposition 3.3 (Chebyshev's inequality) Let X a rv with E[X]=μ and Var[X]=σ2<. Then,

P(|Xμ|kσ)1k2,k>0.

Proof (Proof of Proposition 3.3). This inequality follows from Markov’s inequality. Indeed, taking

X=Xμ,k=kσ,

and replacing X by X and k by k in Markov’s inequality, we get

P(|Xμ|kσ)σ2k2σ2=1k2.

Chebyshev’s inequality is useful for obtaining probability bounds about ˆθ when its probability distribution is unknown. We only need to know the expectation and variance of ˆθ. Indeed, taking k=2, the Chebyshev’s inequality gives

P(|ˆθθ|2σˆθ)114=0.75,

which means that at least the 75% of the realized values of ˆθ fall within the interval [θ2σˆθ,θ+2σˆθ]. However, if we additionally know that the distribution of the estimator ˆθ is normal, ˆθN(θ,σ2ˆθ), then we obtain the much more precise result

P(|ˆθθ|2σˆθ)0.95>0.75.

The fact that the true probability, 0.95, is in this case substantially larger than the lower bound given by Chebyshev’s inequality, 0.75, is reasonable: Chebyshev’s inequality does not employ any knowledge on the distribution of ˆθ. Thus, the precision increases as there is more information about the true distribution of ˆθ.

Example 3.36 From previous experience, it is known that the time X (in minutes) that a periodic check of a machine requires is distributed as Γ(3,2). A new worker spends 19 minutes checking that machine. Is this time coherent with the previous experience?

We know that the mean and the variance of a gamma are given by

μ=αβ=3×2=6,σ2=αβ2=3×22=12σ3.46.

Then, the difference between the checking time of the new worker and μ is 196=13. To see whether this difference is large or small, or, in other words, to see whether the checking time of this new worker is in line with previous checking times, we would want to know the probability

P(|Xμ|13).

For that, we take kσ=13 in Chebyshev’s inequality, so k=13/σ=13/3.46=3.76, and applying the inequality we readily get

P(|Xμ|13)1k2=13.762=0.071.

Since this probability bound is very small, the checking time is not coherent with the previous experience. Two things may have happened: either the new worker has faced a more complicated inspection or he/she is slower than the rest.

Rao–Blackwell’s Theorem

Rao–Blackwell’s Theorem provides an effective form of reducing the variance of an unbiased estimator using a sufficient statistic T. This process is sometimes known as Rao–Blackwellization and results in a new estimator with lower MSE.

Theorem 3.9 (Rao–Blackwell’s Theorem) Let T be a sufficient statistic for θ. Let ˆθ be an unbiased estimator of θ. Then, the estimator

ˆθ:=E[ˆθ|T]

verifies:

  1. ˆθ is independent of θ.
  2. E[ˆθ]=θ, θΘ.
  3. Var[ˆθ]Var[ˆθ], θΘ.

In addition, Var[ˆθ]=Var[ˆθ] if and only if P(ˆθ=ˆθ)=1, θΘ.

Observe that the new estimator ˆθ depends on the sample through the sufficient statistic T and, in particular, on the minimal sufficient statistic.

Example 3.37 Let (X1,,Xn) be a srs of XPois(λ), that is, with pmf

p(x;λ)=λxeλx!,x=0,1,, λ>0.

Consider the parameter θ=p(0;λ)=eλ. Let us perform a Let us perform a Rao–Blackwellization.

First, we need an unbiased estimator, for example:

ˆθ={1if X1=0,0if X10,

since

E[ˆθ]=1×P(X1=0)+0×P(X10)=P(X1=0)=eλ.

Now we need a sufficient estimator. Writing the pmf of the Poisson in the form of the exponential family, we obtain

p(x;λ)=eλexlogλx!

and therefore T(X1,,Xn)=ni=1Xi is sufficient for λ and also for eλ.

Then, we can Rao–Blackwellize ˆθ:

ˆθ:=E[ˆθ|T]=1×P(X1=0|ni=1Xi=t)+0×P(X10|ni=1Xi=t)=P(X1=0,ni=1Xi=t)P(ni=1Xi=t)=P(X1=0)P(ni=2Xi=t)P(ni=1Xi=t).

Now, it holds that, if XiPois(λ), i=1,,n, are independent, then (see Exercise 1.20)

ni=1XiPois(nλ).

Therefore,

ˆθ=eλ[(n1)λ]te(n1)λ/t!(nλ)tenλ/t!=(n1n)t.

Then, we have obtained the estimator

ˆθ=E[ˆθ|T]=(n1n)T,

which is unbiased, and whose variance is smaller than the one of ˆθ. Indeed,

E[ˆθ]=x=1(n1n)xenλ(nλ)xx!=enλx=0(n1)xλxx!=enλe(n1)λ=eλ=θ.

Therefore, ˆθ is unbiased. We compute its variance. For that, in the first place, we compute

E[ˆθ2]=t=0(n1n)2tenλ(nλ)t!=enλt=0((n1)2λn)1t!=enλe(n1)2λ/n=e2λ+λ/n.

Then, the variance is

Var[ˆθ]=E[ˆθ2]E2[ˆθ]=e2λ+λ/ne2λ=e2λ(eλ/n1).

We calculate the variance of ˆθ for comparison:

E[ˆθ2]=1×P(X1=0)=eλ.

As a consequence,

Var[ˆθ]=eλe2λ=eλ(1eλ).

Therefore,

Var[ˆθ]Var[ˆθ]=e2λ(eλ/n1)eλ(1eλ)=eλ/n1eλ1<1,n1.