3.1 Unbiased estimators
Definition 3.2 (Unbiased estimator) Given an estimator ˆθ of a parameter θ, the quantity Bias[ˆθ]:=E[ˆθ]−θ is the bias of the estimator ˆθ. The estimator ˆθ is unbiased if its bias is zero, i.e., if E[ˆθ]=θ.
Example 3.2 We saw in (2.5) and (2.6) that the sample variance S2 was not an unbiased estimator of σ2, whereas the sample quasivariance S′2 was unbiased. From Theorem 2.2 we can see, from an alternative approach based on assuming normality, that S2 is indeed biased. On one hand,
E[nS2σ2]=E[(n−1)S′2σ2]=E[χ2n−1]=n−1
and, on the other,
E[nS2σ2]=nσ2E[S2].
Therefore, equating (3.1) and (3.2) and solving for E[S2], we have
E[S2]=n−1nσ2.
We can also see that S′2 is indeed unbiased. First, we have that
E[(n−1)S′2σ2]=n−1σ2E[S′2].
Then, equating this expectation with the mean of a rv χ2n−1, n−1, and solving for E[S′2], it follows that E[S′2]=σ2.
Example 3.3 Let X∼U(0,θ), that is, its pdf is fX(x)=1/θ, 0<x<θ. Let (X1,…,Xn) be a srs of X. Let us obtain an unbiased estimator of θ.
Since θ is the upper bound for the sample realization, the value from the sample that is closer to θ is X(n), the maximum of the sample. Hence, we take ˆθ:=X(n) as an estimator of θ and check whether it is unbiased. In order to compute its expectation, we need to obtain its pdf. We can derive it from Exercise 2.2: the cdf X(n) for a srs of a rv with cdf FX is [FX]n.
The cdf of X for 0<x<θ is
FX(x)=∫x0fX(t)dt=∫x01θdt=xθ.
Then, the full cdf is
FX(x)={0x<0,x/θ0≤x<θ,1x≥θ.
Consequently, the cdf of the maximum is
FX(n)(x)={0x<0,(x/θ)n,0≤x<θ,1,x≥θ.
The density of X(n) follows by differentiation:
fX(n)(x)=nθ(xθ)n−1,x∈(0,θ).
Finally, the expectation of ˆθ=X(n) is
E[ˆθ]=∫θ0xnθ(xθ)n−1dx=nθn∫θ0xndx=nθnθn+1n+1=nn+1θ≠θ.
Therefore, ˆθ is not unbiased. However, it can be readily patched as
ˆθ′:=n+1nX(n),
which is an unbiased estimator of θ:
E[ˆθ′]=n+1nnn+1θ=θ.
Example 3.4 Let X∼Exp(θ) and let (X1,…,Xn) be a srs of such rv. Let us find an unbiased estimator for θ.
Since X∼Exp(θ), we know that E[X]=1/θ and hence θ=1/E[X]. As ˉX is an unbiased estimator of E[X], it is reasonable to consider ˆθ:=1/ˉX as an estimator of θ. Checking whether it is unbiased requires knowing the pdf of ˆθ. Since Exp(θ)d=Γ(1,1/θ), by the additive property of the gamma (see Exercise 1.21):
T=n∑i=1Xi∼Γ(n,1/θ),
with pdf
fT(t)=1(n−1)!θntn−1e−θt,t>0.
Then, the expectation of the estimator ˆθ=n/T=1/ˉX is given by
E[ˆθ]=∫∞0nt1(n−1)!θntn−1e−θtdt=nθn−1∫∞01(n−2)!θn−1t(n−1)−1e−θtdt=nn−1θ.
Therefore, counterintuitively, ˆθ is not unbiased for θ. However, the corrected estimator
ˆθ′=n−1n1ˉX
is unbiased.
In the previous example we have seen that, even if ˉX is unbiased for E[X], 1/ˉX is biased for 1/E[X]. This illustrates that, even if ˆθ is an unbiased estimator of θ, then in general a transformation by a function g results in an estimator g(ˆθ) that is not unbiased for g(θ).32
The quantity ˆθ−θ is the estimation error, and depends on the particular value of ˆθ for the observed (or realized) sample. Observe that the bias is the expected (or mean) estimation error across all the possible realizations of the sample, which does not depend on the actual realization of ˆθ for a particular sample:
Bias[ˆθ]=E[ˆθ]−θ=E[ˆθ−θ].
If the estimation error is measured in absolute value, |ˆθ−θ|, the quantity E[|ˆθ−θ|] is referred to as the mean absolute error. If the square is taken, (ˆθ−θ)2, then we obtain the so-called Mean Squared Error (MSE)
MSE[ˆθ]=E[(ˆθ−θ)2].
The MSE is mathematically more tractable than the mean absolute error, hence is usually preferred. Since the MSE gives an average of the squared estimation errors, it introduces a performance measure for comparing two estimators ˆθ1 and ˆθ2 of a parameter θ. The estimator with the lowest MSE is the optimal for estimating θ according to the MSE performance measure.
A key identity for the MSE is the following bias-variance decomposition:
MSE[ˆθ]=E[(ˆθ−θ)2]=E[(ˆθ−E[ˆθ]+E[ˆθ]−θ)2]=E[(ˆθ−E[ˆθ])2]+E[(E[ˆθ]−θ)2]+2E[(ˆθ−E[ˆθ])](E[ˆθ]−θ)=Var[ˆθ]+(E[θ]−θ)2=Bias2[ˆθ]+Var[ˆθ].
This identity tells us that if we want to minimize the MSE, it does not suffice to find an unbiased estimator: the variance contributes to the MSE the same as the squared bias. Therefore, if we search for the optimal estimator in terms of MSE, both bias and variance must be minimized.
Figure 3.2 shows four extreme cases of the MSE decomposition: (1) low bias and low variance (ideal); (2) low bias and large variance; (3) large bias and low variance; (4) large bias and large variance (worst). The following is a good analogy of these four cases in terms of a game of darts. The bullseye is θ, the desired target value to hit. Each dart thrown is a realization of the estimator ˆθ. The case (1) represents an experienced player: his/her darts land close to the bullseye consistently. Case (2) represents a less skilled player that has high variability about the target. Cases (3) and (4) represent players that make systematic errors.
![Bias and variance of an estimator \(\hat{\theta},\) represented by the positioning of its simulated distribution (histogram) with respect to the target parameter \(\theta=0\) (red vertical line).](inference_files/figure-html/mse-1.png)
![Bias and variance of an estimator \(\hat{\theta},\) represented by the positioning of its simulated distribution (histogram) with respect to the target parameter \(\theta=0\) (red vertical line).](inference_files/figure-html/mse-2.png)
![Bias and variance of an estimator \(\hat{\theta},\) represented by the positioning of its simulated distribution (histogram) with respect to the target parameter \(\theta=0\) (red vertical line).](inference_files/figure-html/mse-3.png)
![Bias and variance of an estimator \(\hat{\theta},\) represented by the positioning of its simulated distribution (histogram) with respect to the target parameter \(\theta=0\) (red vertical line).](inference_files/figure-html/mse-4.png)
Figure 3.2: Bias and variance of an estimator ˆθ, represented by the positioning of its simulated distribution (histogram) with respect to the target parameter θ=0 (red vertical line).
Example 3.5 Let us compute the MSE of the sample variance S2 and the sample quasivariance S′2 when estimating the population variance σ2 of a normal rv (this assumption is fundamental for obtaining the expression for the variance of S2 and S′2).
In Exercise 2.19 we saw that, for a normal population,
E[S2]=n−1nσ2,Var[S2]=2(n−1)n2σ4,E[S′2]=σ2,Var[S′2]=2n−1σ4.
Therefore, the bias of S2 is
Bias[S2]=n−1nσ2−σ2=−1nσ2<0
and the MSE of S2 for estimating σ2 is
MSE[S2]=Bias2[S2]+Var[S2]=1n2σ4+2(n−1)n2σ4=2n−1n2σ4.
Replicating the calculations for the sample quasivariance, we have that
MSE[S′2]=Var[S′2]=2n−1σ4.
Since n>1, we have
2n−1>2n−1n2
and, as a consequence,
MSE[S′2]>MSE[S2].
The bottom line is clear: despite S′2 being unbiased and S2 not, for normal populations S2 has lower MSE than S′2 when estimating σ2. Therefore, S2 is better than S′2 in terms of MSE for estimating σ2 in normal populations. This highlights that unbiased estimators are not always to be preferred in terms of the MSE!
The use of unbiased estimators is convenient when the sample size n is large, since in those cases the variance tends to be small. However, when n is small, the bias is usually very small compared with the variance, so a smaller MSE can be obtained by focusing on decreasing the variance. On the other hand, it is possible that, for a parameter and a given sample, there is no unbiased estimator, as the following example shows.
Example 3.6 The next game is presented to us. We have to pay 6 euros in order to participate and the payoff is 12 euros if we obtain two heads in two tosses of a coin with heads probability p. We receive 0 euros otherwise. We are allowed to perform a test toss for estimating the value of the success probability θ=p2.
In the coin toss we observe the value of the rv
X1={1if heads,0if tails.
We know that X1∼Ber(p). Let
ˆθ:={ˆθ1if X1=1,ˆθ0if X1=0.
be an estimator of θ=p2, where ˆθ0 and ˆθ1 are to be determined. Its expectation is
E[ˆθ]=ˆθ1×p+ˆθ0×(1−p),
which is different from p2 for any estimator ˆθ; E[ˆθ]=p2 will be achieved if ˆθ1=p and ˆθ0=0, which is not allowed since ˆθ can only depend on the sample and not on the unknown parameter. Therefore, for any given sample of size n=1 there does not exist any unbiased estimator of p2.
Can you think of a class of transformations for which unbiasedness is actually preserved?↩︎