Chapter 6 Ancillary Statistics, Complete Statistics (Lecture on 01/21/2020)

Definition 6.1 (Ancillary Statistic) A statistic \(S(\mathbf{X})\) whose distribution does not depend on the parameter \(\theta\) is called an ancillary statistic.
Ancillary statistic alone contains no information about \(\theta\). An ancillary statistic is an observation on a random variable whose distribution is fixed and known, unrelated to \(\theta\).

Example 6.1 (Uniform Ancillary Statistic) Let \(X_1,\cdots,X_n\) be i.i.d. uniform observations on the interval \((\theta,\theta+1),-\infty<\theta<\infty\). Let \(X_{(1)}<\cdots<X_{(n)}\) be the order statistics from the sample. The range statistic, \(R=X_{(n)}-X_{(1)}\), is an ancillary statistic. The cdf of each \(X_i\) is

\[\begin{equation} F(x|\theta)=\left\{\begin{aligned} & 0 &\quad x\leq\theta \\ & x-\theta & \quad \theta<x<\theta+1\\ & 1 & \quad x\geq\theta+1\end{aligned}\right. \tag{6.1} \end{equation}\]

Thus, the joint pdf of \(X_{(1)}\) and \(X_{(n)}\) is \[\begin{equation} g(x_{(1)},x_{(n)}|\theta)=\left\{ \begin{aligned} & n(n-1)(x_{(n)}-x_{(1)})^{n-2} & \quad \theta<x_{(1)}<x_{(n)}<\theta+1\\ & 0 & \quad otherwise \end{aligned} \right. \tag{6.2} \end{equation}\]

Making the transformation \(R=X_{(n)}-X_{(1)}\) and \(M=(X_{(1)}+X_{(n)})/2\), which has the inverse transformation \(X_{(1)}=(2M-R)/2\) and \(X_{(n)}=(2M+R)/2\) with Jacobian \(|J|=1\). Then the joint pdf of \(R\) and \(M\) is \[\begin{equation} h(r,m|\theta)=\left\{\begin{aligned} & n(n-1)r^{n-2} & \quad 0<r<1, \theta+(r/2)<m<\theta+1-(r/2)\\ & 0 & \quad otherwise \end{aligned} \right. \tag{6.3} \end{equation}\] Thus, the pdf of \(R\) is \[\begin{equation} h(r|\theta)=\int_{\theta+(r/2)}^{\theta+1-(r/2)}n(n-1)r^{n-2}dm=n(n-1)r^{n-2}(1-r) \tag{6.4} \end{equation}\] for \(0<r<1\). This is a \(Beta(n-1,2)\) and independent with the choice of \(\theta\). Thus, \(R\) is ancillary.
Example 6.2 (Location Family Ancillary Statistic) Let \(X_1,\cdots,X_n\) be i.i.d. observations from a location parameter family with cdf \(F(x-\theta), -\infty<\theta<\infty\). The range \(R=X_{(n)}-X_{(1)}\) is always an ancillary statistic. Suppose \(Z_1,\cdots,Z_n\) are i.i.d. observations from \(F(x)\), with \(X_1=Z_1+\theta,\cdots,X_n=Z_n+\theta\). Thus, the cdf of the range statistic R is \[\begin{equation} \begin{split} F_R(r|\theta)&=P_{\theta}(R\leq r)=P_{\theta}(\max_i X_i-\min_i X_i\leq r)\\ &=P_{\theta}(\max_i Z_i-\min_i Z_i\leq r) \end{split} \tag{6.5} \end{equation}\] The last probability does not depend on \(\theta\) becasue the distribution of \(Z_1,\cdots,Z_n\) does not depend on \(\theta\). Thus, the cdf of \(R\) does not depend on \(\theta\) and hence R is ancillary.

Example 6.3 (Scale Family Ancillary Statistic) Let \(X_1,\cdots,X_n\) be i.i.d. observations from a scale parameter family with cdf \(F(x/\sigma),\sigma>0\). Then any statistic that depends on the sample only through the \(n-1\) values \(X_1/X_n,\cdots,X_{n-1}/X_n\) is an ancillary statistic. For example, \(\frac{X_1+\cdots+X_n}{X_n}=\frac{X_1}{X_n}+\cdots+\frac{X_{n-1}}{X_n}+1\) is an ancillary statistic. Let \(Z_1,\cdots,Z_n\) be i.i.d. observations from \(F(x)\) with \(X_i=\sigma Z_i\). Then the joint cdf of \(X_1/X_n,\cdots,X_{n-1}/X_n\) is \[\begin{equation} F(y_1,\cdots,y_{n-1}|\sigma)=P(Z_1/Z_n\leq y_1,\cdots,Z_{n-1}/Z_n\leq y_{n-1}) \tag{6.6} \end{equation}\] The last probability does not depend on \(\sigma\) becasue the distribution of \(Z_1,\cdots,Z_n\) does not depend on \(\sigma\). So the distribution of \(X_1/X_n,\cdots,X_{n-1}/X_n\) is independent of \(\sigma\), as is the distribution of any function of these quantities.

Note that this is actually a very general result, any ratio \(X_i/X_j\) is an ancillary statistic for scale family distributions.
Minimal sufficient statistic is not necessarily unrelatted to an ancillary statistic. For example, consider \(X_1,\cdots,X_n\) from \(Unif(\theta,\theta+1)\) distribution. \(T(\mathbf{X})=(X_{(1)},X_{(n)})\) is a minimal sufficient statistic in this case, then so is one-to-one function of \((X_{(1)},X_{(n)})\), such as \((X_{n}-X_{1},X_{n}+X_{1}/2)\). In Example 6.1 we have showed \(X_{n}-X_{1}\) is an ancillary statistic. Certainly, the ancillary statistic and minimal sufficient statistic are not independent.

Example 6.4 (Ancillary Precision) Let \(X_1\) and \(X_2\) be i.i.d. observations from the discrete distribution that satisfies \[\begin{equation} P_{\theta}(X=\theta)=P_{\theta}(X=\theta+1)=P_{\theta}(X=\theta+2)=\frac{1}{3} \tag{6.7} \end{equation}\] where \(\theta\) is an unknown integer. Let \(X_{(1)}\leq X_{(2)}\) be the order statistics. Then \(R=X_{(2)}-X_{(1)}\) and \(M=(X_{(1)}+X_{(2)})/2\) is a minimal sufficient statistic. Since this is also a location family, \(R\) is an ancillary statistic. How \(R\) might give information about \(\theta\)?

Consider a sample point \((r,m)\), where \(m\) is an integer. Firstly, consider only \(m\), for this sample point to have positive probability, \(\theta\) can only take three values, \(m\), \(m-1\), or \(m-2\). Suppose we get additional information \(R=2\). Then it must be the case that \(X_{(1)}=m-1\) and \(X_{(2)}=m+1\). With this additional information, the only possible value of \(\theta\) is \(m-1\). Thus, the knowledge of the value pf the ancillary statistic \(R\) has increased our knowledge about \(\theta\) while knowing \(R\) itself would give us no information about \(\theta\).
Definition 6.2 (Complete Statistic) Let \(f(t|\theta)\) be a family of pdfs or pmfs for a statistic \(T(\mathbf{X})\). The family of probability distribution is called complete for any function \(g\) satisfies if \(E_{\theta}g(T)=0\) for all \(\theta\) implies \(P_{\theta}(g(T)=0)=1\) for all \(\theta\). Equivalently, \(T(\mathbf{X})\) is called a complete statistic.
Completeness is a property of a family of probability distributions, not for a particular distribution. It says that for the distribution of a certain statistic \(T(\mathbf{X})\), if a function \(g(T(\mathbf{X}))\) have expectation 0, then \(g(T(\mathbf{X}))=0\) almost surely.
Example 6.5 (Binomial Complete Sufficient Statistic) Suppose that \(T\sim Bin(n,p)\) with \(0<p<1\). Let \(g\) be a function such that \(E_pg(T)=0\). Then \[\begin{equation} \begin{split} 0=E_pg(T)&=\sum_{t=0}^ng(t){n \choose t}p^t(1-p)^{n-t}\\ &=(1-p)^n\sum_{t=0}^ng(t){n \choose t}(\frac{p}{1-p})^t \end{split} \tag{6.8} \end{equation}\] Thus, we have \(g(t)=0\) for \(t=0,1,\cdots,n\) which yields \(P(g(T)=0)=1\) for all \(p\). Hence, T is a complete statistic.
Example 6.6 (Uniform Complete Sufficient Statistic) Let \(X_1,\cdots,X_n\) be i.i.d. \(Unif(0,\theta)\) observations. \(T(\mathbf{X})=\max_iX_i\) is a sufficient statistic with pdf \[\begin{equation} f(t|\theta)=\left\{\begin{aligned} & nt^{n-1}\theta^{-n} & \quad 0<t<\theta\\ & 0 & \quad otherwise \end{aligned}\right. \tag{6.9} \end{equation}\] Suppose \(g(t)\) is a function satisfies \(E_{\theta}g(T)=0\) for all \(\theta\). Then since \(E_{\theta}g(T)\) is constant 0 as a function of \(\theta\), its derivatives w.r.t. \(\theta\) is 0. Thus, \[\begin{equation} \begin{split} 0=\frac{d}{d\theta}E_{\theta}g(T)&=\frac{d}{d\theta}\int_0^{\theta}g(t)nt^{n-1}\theta^{-n}dt\\ &=(\theta^{-n})\frac{d}{d\theta}\int_0^{\theta}g(t)nt^{n-1}dt+(\frac{d}{d\theta}\theta^{-n})\int_0^{\theta}g(t)nt^{n-1}dt\\ &=\theta^{-n}ng(\theta)\theta^{n-1}+0==\theta^{-1}ng(\theta) \end{split} \tag{6.10} \end{equation}\] The second term of the second line is 0 becasue the integral is, except for a constant, equal to \(E_{\theta}g(T)=0\). Since \(\theta^{-1}n\neq0\), it must be \(g(\theta)=0\) for every \(\theta>0\). Since \(T(\mathbf{X})=\max_iX_i>0\) almost surely, \(g(T)=0\) almost surely. Hence, \(T\) is a complete statistic.
Theorem 6.1 (Basu Theorem) If \(T(\mathbf{X})\) is a complete and minimal sufficient statistic, then \(T(\mathbf{X})\) is independent of every ancillary statistic.

Proof. This proof is only for discrete distributions.

Let \(S(\mathbf{X})\) be any ancillary statistic. Then \(P(S(\mathbf{X})=s)\) does not depend on \(\theta\). Also the conditional probability \[\begin{equation} P(S(\mathbf{X})=s|T(\mathbf{X})=t)=P(\mathbf{X}\in\{\mathbf{x}:S(\mathbf{X})=s\}|T(\mathbf{X})=t) \tag{6.11} \end{equation}\] does not depend on \(\theta\) because \(T(\mathbf{X})=t\) is a sufficient statistic. Thus, to show \(S(\mathbf{X})\) and \(T(\mathbf{X})\) are independent, it is sufficient to show \[\begin{equation} P(S(\mathbf{X})=s|T(\mathbf{X})=t)=P(S(\mathbf{X})=s) \tag{6.12} \end{equation}\] for all \(t\in\mathcal{T}\). Now \[\begin{equation} P(S(\mathbf{X})=s)=\sum_{t\in\mathcal{T}}P(S(\mathbf{X})=s|T(\mathbf{X})=t)P_{\theta}(T(\mathbf{X})=t) \tag{6.13} \end{equation}\] Furthermore, since \(\sum_{t\in\mathcal{T}}P_{\theta}(T(\mathbf{X})=t)=1\), we have \[\begin{equation} P(S(\mathbf{X})=s)=P(S(\mathbf{X})=s)\sum_{t\in\mathcal{T}}P_{\theta}(T(\mathbf{X})=t)=\sum_{t\in\mathcal{T}}=P(S(\mathbf{X})=s)P_{\theta}(T(\mathbf{X})=t) \tag{6.14} \end{equation}\] Therefore, if we define the statistic \[\begin{equation} g(t)=P(S(\mathbf{X})=s|T(\mathbf{X})=t)-P(S(\mathbf{X})=s) \tag{6.15} \end{equation}\] then from (6.13) and (6.14) we have \[\begin{equation} E_{\theta}g(T)=\sum_{t\in\mathcal{T}}g(t)P_{\theta}(T(\mathbf{X})=t)=0,\forall \theta \tag{6.15} \end{equation}\] Since \(T(\mathbf{X})\) is a complete statistic, this implies that \(g(t)=0\) for all possible values of \(t\in\mathcal{T}\). Hence (6.12) is verified.
Theorem 6.2 (Complete Statistics in Exponential Family) Let \(X_1,\cdots,X_n\) be i.i.d. observations from an exponential family with pdf or pmf of the form \[\begin{equation} f(x|\boldsymbol{\theta})=h(x)c(\boldsymbol{\theta})exp(\sum_{j=1}^k\omega(\theta_j)t_j(x)) \tag{6.16} \end{equation}\] where \(\boldsymbol{\theta}=(\theta_1,\cdots,\theta_k)\). Then the statistic \(T(\mathbf{X})=(\sum_{i=1}^nt_1(X_i),\cdots,\sum_{i=1}^nt_k(X_i))\) is complete as long as the parameter space \(\Theta\) contains an open set in \(\mathbb{R}^k\).
Example 6.7 (Using Basu Theorem-I) Let \(X_1,\cdots,X_n\) be i.i.d. exponential observations with parameter \(\theta\). Consider computing the expected value of \(g(\mathbf{X})=\frac{X_n}{X_1+\cdots+X_n}\). By Example 6.3, \(g(\mathbf{X})\) is an ancillary statistic. The exponential distribution is also from exponential family with \(t(x)=x\), so by Theorem 6.2, \(T(\mathbf{X})=\sum_{i=1}^nX_i\) is a complete statistic and by Theorem 5.2 and 5.3 \(T(\mathbf{X})\) is also a minimal sufficient statistic. Hence, by Basu Theorem, \(T(\mathbf{X})\) and \(g(\mathbf{X})\) are independent. Thus, we have \[\begin{equation} \theta=E_{\theta}X_n=E_{\theta}T(\mathbf{X})g(\mathbf{X})=(E_{\theta}T(\mathbf{X}))(E_{\theta}g(\mathbf{X}))=n\theta E_{\theta}g(\mathbf{X}) \tag{6.17} \end{equation}\] Hence, for any \(\theta\), \(E_{\theta}g(\mathbf{X})=n^{-1}\).
Minimality of the sufficient statistic was not used in the proof of Basu Theorem. The theorem is true with this word omitted becasue a fundamental property of a complete statistic is that it is minimal.
Theorem 6.3 If a minimal sufficient statistic exists, then any complete statistic is also a minimal sufficient statistic.