Chapter 13 Sufficiency and Unbiasedness(Lecture on 02/11/2020)

The main theorem in this part relates sufficient statistics to unbiased estimates.

Theorem 13.1 (Rao-Blackwell) Let \(W\) be any unbiased estimator of \(\tau(\theta)\), and let \(T\) be a sufficient statistic for \(\theta\). Define \(\phi(T)=E(W|T)\). Then \(E_{\theta}\phi(T)=\tau(\theta)\) and \(Var_{\theta}\phi(T)\leq Var_{\theta}W\) for all \(\theta\); That is, \(\phi(T)\) is a uniformly better unbiased estimator of \(\tau(\theta)\).

Proof. From law of total expectation we have \[\begin{equation} \tau(\theta)=E_{\theta}W=E_{\theta}[E(W|T)]=E_{\theta}\phi(T) \tag{13.1} \end{equation}\]
so \(\phi(T)\) is unbiased for \(\tau(\theta)\). Also, \[\begin{equation} \begin{split} Var_{\theta}W&=Var_{\theta}[E(W|T)]+E_{\theta}[Var(W|T)]\\ &=Var_{\theta}\phi(T)+E_{\theta}[Var(W|T)]\\ \geq Var_{\theta}\phi(T) \end{split} \tag{13.2} \end{equation}\] Hence, \(\phi(T)\) is uniformly better than \(W\), and it only remains to show that \(\phi(T)\) is indeed an estimator. That is, show \(\phi(T)\) is a function of only sample and in particular, independent of \(\theta\). This follows from the definition of sufficiency, and the fact that \(W\) is a function only of the sample, that the distribution of \(W|T\) is independent of \(\theta\). Hence, \(\phi(T)\) is a uniformly better unbiased estimator of \(\tau(\theta)\).

Conditioning any unbiased estimator on a sufficient statistic will result in a uniform improvement, so we need consider only statisitcs that are functions of a sufficient statistic in searching for best unbiased estimators.
Conditioning on anything will result in an improvement of unbiased estimator, but the resulting quantity will depend on \(\theta\) and thus resulting to not be an estimator unless conditioning on sufficient statistics.

Theorem 13.2 If \(W\) is a best unbiased estimator of \(\tau(\theta)\), then \(W\) is unique.

Proof. Suppose \(W^{\prime}\) is another best unbiased estimator, and consider the estimator \(W^{*}=\frac{1}{2}(W+W^{\prime})\). Note that \(E_{\theta}W^*=\tau(\theta)\) and \[\begin{equation} \begin{split} Var_{\theta}W^*&=Var_{\theta}(\frac{1}{2}W+\frac{1}{2}W^{\prime})\\ &=\frac{1}{4}Var_{\theta}W+Var_{\theta}W^{\prime}+\frac{1}{2}Cov_{\theta}(W,W^{\prime})\\ &\leq \frac{1}{4}Var_{\theta}W+Var_{\theta}W^{\prime}+\frac{1}{2}[(Var_{\theta}W)(Var_{\theta}W^{\prime})]^{1/2}\\ &=Var_{\theta}W \end{split} \tag{13.3} \end{equation}\]

If the above inequality is strict, the the best unbiasedness of \(W\) is contradicted, so we must have equality for all \(\theta\), which implies the condition \(W^{\prime}=a(\theta)W+b(\theta)\). Now using properties of covariance, we have \[\begin{equation} \begin{split} Cov_{\theta}(W,W^{\prime})&=Cov_{\theta}[W,a(\theta)W+b(\theta)]\\ &=Cov_{\theta}[W,a(\theta)W]\\ &=a(\theta)Var_{\theta}W \end{split} \tag{13.4} \end{equation}\]

But from (13.3) we have \(Cov_{\theta}(W,W^{\prime})=Var_{\theta}W\), we get \(a(\theta)=1\) and since \(E_{\theta}W^{\prime}=\tau(\theta)\), we must have \(b(\theta)=0\). Thus, \(W=W^{\prime}\) and we get uniqueness.

Theorem 13.3 If \(E_{\theta}W=\tau(\theta)\), \(W\) is the best unbiased estimator of \(\tau(\theta)\) if and only if \(W\) is uncorrelated with all unbiased estimators of 0.

Proof. If \(W\) is best unbiased, the theorem states that \(W\) must satisfy \(Cov_{\theta}(W,U)=0\) for all \(\theta\) and any \(U\) satisfying \(E_{\theta}U=0\). To verify this, conisder \(W\) and \(U\) as unbiased estimator of \(\tau(\theta)\) and 0, then the estimator \[\begin{equation} \phi_a=W+aU \tag{13.5} \end{equation}\] where a is a constant is also a unbiased estimator of \(\tau(\theta)\). The variance of \(\phi_a\) is \[\begin{equation} Var_{\theta}\phi_a=Var_{\theta}(W+aU)=Var_{\theta}W+2aCov_{\theta}(W,U)+a^2Var_{\theta}U \tag{13.6} \end{equation}\]

Now if for some \(\theta=\theta_0\), \(Cov_{\theta_0}(W,U)<0\), then we can make \(2aCov_{\theta}(W,U)+a^2Var_{\theta}U<0\) by choosing \(a\in(0,-2Cov_{\theta_0}(W,U)/Var_{\theta_0}U)\) and result in \(\phi_a\) better than \(W\) at \(\theta=\theta_0\) and \(W\) cannot be best unbiased. On the other hand, it can be shown that if \(Cov_{\theta}(W,U)>0\) for some \(\theta_0^*\), then we can also choose \(a\in(-2Cov_{\theta_0}(W,U)/Var_{\theta_0}U,0)\) to make \(\phi_a\) better than \(W\) at \(\theta=\theta_0^*\) and \(W\) not best unbiased. Therefore, if \(W\) is indeed unbiased, we must have \(Cov_{\theta}(W,U)=0\) for all \(\theta\) and any \(U\) satisfying \(E_{\theta}U=0\), the necessity is estabilished.

Now suppose we have an unbiased estimator \(W\) that is uncorrelated with all unbiased estimator of 0. Let \(W^{\prime}\) be any other unbiased estimator of \(\tau(\theta)\), we proceed to show \(W\) is better than \(W^{\prime}\). Write \[\begin{equation} W^{\prime}=W+(W^{\prime}-W) \tag{13.7} \end{equation}\] we have the variance of \(W^{\prime}\) as \[\begin{equation} \begin{split} Var_{\theta}W^{\prime}&=Var_{\theta}W+Var_{\theta}(W^{\prime}-W)+2Cov_{\theta}(W,W^{\prime}-W)\\ &=Var_{\theta}W+Var_{\theta}(W^{\prime}-W)>Var_{\theta}W \end{split} \tag{13.8} \end{equation}\] Since \(W^{\prime}\) is arbitrary, it follows that \(W\) is the best unbiased estimator of \(\tau(\theta)\).

An unbiased estimator of 0 is nothing more than random noise, there is no information in an estimator of 0. If an estimator could be improved by adding random noise to it, the estimator is probably defective. This is the intuition behind Theorem 13.3.

Example 13.1 (Unibased Estimators of Zero) Let \(X\) be an observation from \(Unif(\theta,\theta+1)\), then \[\begin{equation} E_{\theta}X=\int_{\theta}^{\theta+1}xdx=\theta+\frac{1}{2} \tag{13.9} \end{equation}\] and so \(X-\frac{1}{2}\) is an unbiased estimator of \(\theta\) and \(Var_{\theta}X=\frac{1}{12}\).

Now we proceed to find unbiased estimator of zero. If function \(h(x)\) satisfies \[\begin{equation} \int_{\theta}^{\theta+1}h(x)dx=0,\quad \forall\theta \tag{13.10} \end{equation}\] then \[\begin{equation} 0=\frac{d}{d\theta}\int_{\theta}^{\theta+1}h(x)dx=h(\theta+1)-h(\theta),\forall \theta \tag{13.11} \end{equation}\] Function \(h(x)\) is periodic with period 1. Such a function is \(h(x)=\sin(2\pi x)\). Now \[\begin{equation} \begin{split} Cov_{\theta}(X-\frac{1}{2},\sin(2\pi X))&=Cov_{\theta}(X,\sin(2\pi X))\\ &=\int_{\theta}^{\theta+1}x\sin(2\pi x)dx\\ &=-\frac{x\cos(2\pi x)}{2\pi}|_{\theta}^{\theta+1}+\int_{\theta}^{\theta+1}\frac{cos(2\pi x)}{2\pi}dx\\ &=-\frac{x\cos(2\pi\theta)}{2\pi} \end{split} \tag{13.12} \end{equation}\] Hence \(X-\frac{1}{2}\) is correlated with an unbiased estimator of zero, and cannot be a best unbiased estimator of \(\theta\).

Now to answer the question about existence of a best unbiased estimator, we needed to have some characterization of all unbiased estimator of zero. One speciall case is that if the p.d.f. or p.m.f. we are working with have no unbiased estimators of zero other than zero itself, then the unbiased estimator would be the best since for any statistics, \(Cov_{\theta}(W,0)=0\). And by Definition 6.2, the condition of having only trival unbiased estimator of zero is that the distribution is complete.

Example 13.2 (Example 12.7 Continued) Let \(X_1,\cdots,X_n\) be i.i.d. with p.d.f. \(f(x|\theta)=1/\theta,x\in(0,\theta)\), since \(\frac{n+1}{n}Y\) is an unbiased estimator of \(\theta\) where \(Y=max\{X_1,\cdots,X_n\}\). In Example 6.6, \(Y\) is a complete sufficient statistic. This means that the family of p.d.f. of \(Y\) is complete, and there are no unbiased estimator of zero that are based on Y. (By sufficiency, in the form of the Rao-Blackwell Theorem, we need consider only unbiased estimators of zero beased on \(Y\).) Therefore, \(\frac{n+1}{n}Y\) is uncorrelated with all unbiased estimators of zero and thus it is the best unbiased estimator of \(\theta\).

Note that what is important is the completeness of the family of distributions of the sufficient statistic. Complete of the original family is of no consequence. This follows from the Rao-Blackwell Theorem, which states that the attention can be restricted to functions of sufficient statistic only.

Theorem 13.4 Let T be a complete sufficient statistic for a parameter \(\theta\), and let \(\phi(T)\) be any estimator based only on \(T\). Then \(\phi(T)\) is the unique best unbiased estimator of its expected value.

Corollary 13.1 If \(T\) is a complete sufficient statistic for a parameter \(\theta\) and \(h(X_1,\cdots,X_n)\) is any unbiased estimator of \(\tau(\theta)\), then \(\phi(T)=E(h(X_1,\cdots,X_n)|T)\) is the best unbiased estimator of \(\tau(\theta)\).

Proof. Because \(T\) is sufficient, \(E(h(X_1,\cdots,X_n)|T)\) is a function of only \(T\) (dose not depend on \(\theta\)). Thus, it is an estimator. The unbiasness follows directly from the law of total probability. Thus, by Theorem 13.4 this corollory holds.

Example 13.3 (Binomial Best Unbiased Estimation) Let \(X_1,\cdots,X_n\) be i.i.d. \(Bin(k,\theta)\) with \(k\) known. The problem is to estimate the probability of exactly one success from a \(Bin(k,\theta)\), that is, estimate \[\begin{equation} \tau(\theta)=P_{\theta}(X=1)=k\theta(1-\theta)^{k-1} \tag{13.13} \end{equation}\] Now \(\sum_{i=1}^nX_i\sim Bin(kn,theta)\) is a complete sufficient statistic, but no unbiased estimator based on it is immediately evident. When in this situation, try for the simplest solution. The simple-minded estimator \[\begin{equation} h(X_1)=\left\{\begin{aligned} 1 & \quad X_i=1\\ 0 & \quad o.w. \end{aligned} \right. \tag{13.14} \end{equation}\] satisfies \[\begin{equation} E_{\theta}h(X_1)=\sum_{x_1=0}^kh(x_1){k \choose x_1}\theta^{x_1}(1-\theta)^{k-x_1}=k\theta(1-\theta)^{k-1} \tag{13.15} \end{equation}\] and hence is an unbiased estimator of \(k\theta(1-\theta)^{k-1}\). Now we have \[\begin{equation} \phi(\sum_{i=1}^nX_i)=E(h(X_1)|\sum_{i=1}^nX_i) \tag{13.16} \end{equation}\] is the best unbiased estimator of \(k\theta(1-\theta)^{k-1}\). Now we are left with evaluate \(\phi\). Suppose we observe \(\sum_{i=1}^nX_i=t\), then \[\begin{equation} \begin{split} \phi(t)&=E(h(X_1)|\sum_{i=1}^nX_i=t)\\ &=p(X_1=1|\sum_{i=1}^nX_i=t)\\ &=\frac{p_{\theta}(X_1=1,\sum_{i=1}^nX_i=t)}{p_{\theta}(\sum_{i=1}^nX_i=t)}\\ &=\frac{p_{\theta}(X_1=1,\sum_{i=2}^nX_i=t-1)}{p_{\theta}(\sum_{i=1}^nX_i=t)}\\ &=\frac{P_{\theta}(X_1=1)P_{\theta}(\sum_{i=2}^nX_i=t-1))}{p_{\theta}(\sum_{i=1}^nX_i=t)} \end{split} \tag{13.17} \end{equation}\]

Now since \(X_1\sim Bin(k,\theta)\), \(\sum_{i=2}^nX_i\sim Bin(k(n-1),\theta)\) and \(\sum_{i=1}^nX_i\sim Bin(kn,\theta)\), we have

\[\begin{equation} \phi(t)=k\frac{{{k(n-1)} \choose {t-1}}}{{{kn} \choose t}} \tag{13.18} \end{equation}\]

Note that all of the \(\theta\)s cancel as they must since \(\sum_{i=1}^nX_i\) is sufficient. Hence, the best unbiased estimator of \(k\theta(1-\theta)^{k-1}\) is \[\begin{equation} \phi(\sum_{i=1}^nX_i)=k\frac{{{k(n-1)} \choose {\sum_{i=1}^nX_i-1}}}{{{kn} \choose {\sum_{i=1}^nX_i}}} \tag{13.19} \end{equation}\]