Ancillary Statistics, Complete Statistics (Lecture on 01/21/2020)
Definition 6.1 (Ancillary Statistic) A statistic S(X) whose distribution does not depend on the parameter θ is called an ancillary statistic.
Ancillary statistic alone contains no information about θ. An ancillary statistic is an observation on a random variable whose distribution is fixed and known, unrelated to θ.
Example 6.1 (Uniform Ancillary Statistic) Let X1,⋯,Xn be i.i.d. uniform observations on the interval (θ,θ+1),−∞<θ<∞. Let X(1)<⋯<X(n) be the order statistics from the sample. The range statistic, R=X(n)−X(1), is an ancillary statistic. The cdf of each Xi is
F(x|θ)={0x≤θx−θθ<x<θ+11x≥θ+1
Thus, the joint pdf of X(1) and X(n) is
g(x(1),x(n)|θ)={n(n−1)(x(n)−x(1))n−2θ<x(1)<x(n)<θ+10otherwise
Making the transformation
R=X(n)−X(1) and
M=(X(1)+X(n))/2, which has the inverse transformation
X(1)=(2M−R)/2 and
X(n)=(2M+R)/2 with Jacobian
|J|=1. Then the joint pdf of
R and
M is
h(r,m|θ)={n(n−1)rn−20<r<1,θ+(r/2)<m<θ+1−(r/2)0otherwise
Thus, the pdf of
R is
h(r|θ)=∫θ+1−(r/2)θ+(r/2)n(n−1)rn−2dm=n(n−1)rn−2(1−r)
for
0<r<1. This is a
Beta(n−1,2) and independent with the choice of
θ. Thus,
R is ancillary.
Example 6.2 (Location Family Ancillary Statistic) Let X1,⋯,Xn be i.i.d. observations from a location parameter family with cdf F(x−θ),−∞<θ<∞. The range R=X(n)−X(1) is always an ancillary statistic. Suppose Z1,⋯,Zn are i.i.d. observations from F(x), with X1=Z1+θ,⋯,Xn=Zn+θ. Thus, the cdf of the range statistic R is
FR(r|θ)=Pθ(R≤r)=Pθ(max
The last probability does not depend on \theta becasue the distribution of Z_1,\cdots,Z_n does not depend on \theta. Thus, the cdf of R does not depend on \theta and hence R is ancillary.
Example 6.3 (Scale Family Ancillary Statistic) Let X_1,\cdots,X_n be i.i.d. observations from a scale parameter family with cdf F(x/\sigma),\sigma>0. Then any statistic that depends on the sample only through the n-1 values X_1/X_n,\cdots,X_{n-1}/X_n is an ancillary statistic. For example, \frac{X_1+\cdots+X_n}{X_n}=\frac{X_1}{X_n}+\cdots+\frac{X_{n-1}}{X_n}+1 is an ancillary statistic. Let Z_1,\cdots,Z_n be i.i.d. observations from F(x) with X_i=\sigma Z_i. Then the joint cdf of X_1/X_n,\cdots,X_{n-1}/X_n is
\begin{equation}
F(y_1,\cdots,y_{n-1}|\sigma)=P(Z_1/Z_n\leq y_1,\cdots,Z_{n-1}/Z_n\leq y_{n-1})
\tag{6.6}
\end{equation}
The last probability does not depend on \sigma becasue the distribution of Z_1,\cdots,Z_n does not depend on \sigma. So the distribution of X_1/X_n,\cdots,X_{n-1}/X_n is independent of \sigma, as is the distribution of any function of these quantities.
Note that this is actually a very general result, any ratio X_i/X_j is an ancillary statistic for scale family distributions.
Minimal sufficient statistic is not necessarily unrelatted to an ancillary statistic. For example, consider
X_1,\cdots,X_n from
Unif(\theta,\theta+1) distribution.
T(\mathbf{X})=(X_{(1)},X_{(n)}) is a minimal sufficient statistic in this case, then so is one-to-one function of
(X_{(1)},X_{(n)}), such as
(X_{n}-X_{1},X_{n}+X_{1}/2). In Example
6.1 we have showed
X_{n}-X_{1} is an ancillary statistic. Certainly, the ancillary statistic and minimal sufficient statistic are not independent.
Example 6.4 (Ancillary Precision) Let X_1 and X_2 be i.i.d. observations from the discrete distribution that satisfies
\begin{equation}
P_{\theta}(X=\theta)=P_{\theta}(X=\theta+1)=P_{\theta}(X=\theta+2)=\frac{1}{3}
\tag{6.7}
\end{equation}
where \theta is an unknown integer. Let X_{(1)}\leq X_{(2)} be the order statistics. Then R=X_{(2)}-X_{(1)} and M=(X_{(1)}+X_{(2)})/2 is a minimal sufficient statistic. Since this is also a location family, R is an ancillary statistic. How R might give information about \theta?
Consider a sample point
(r,m), where
m is an integer. Firstly, consider only
m, for this sample point to have positive probability,
\theta can only take three values,
m,
m-1, or
m-2. Suppose we get additional information
R=2. Then it must be the case that
X_{(1)}=m-1 and
X_{(2)}=m+1. With this additional information, the only possible value of
\theta is
m-1.
Thus, the knowledge of the value pf the ancillary statistic R has increased our knowledge about \theta while knowing R itself would give us no information about \theta.
Definition 6.2 (Complete Statistic) Let f(t|\theta) be a family of pdfs or pmfs for a statistic T(\mathbf{X}). The family of probability distribution is called complete for any function g satisfies if E_{\theta}g(T)=0 for all \theta implies P_{\theta}(g(T)=0)=1 for all \theta. Equivalently, T(\mathbf{X}) is called a complete statistic.
Completeness is a property of a family of probability distributions, not for a particular distribution. It says that for the distribution of a certain statistic T(\mathbf{X}), if a function g(T(\mathbf{X})) have expectation 0, then g(T(\mathbf{X}))=0 almost surely.
Example 6.5 (Binomial Complete Sufficient Statistic) Suppose that T\sim Bin(n,p) with 0<p<1. Let g be a function such that E_pg(T)=0. Then
\begin{equation}
\begin{split}
0=E_pg(T)&=\sum_{t=0}^ng(t){n \choose t}p^t(1-p)^{n-t}\\
&=(1-p)^n\sum_{t=0}^ng(t){n \choose t}(\frac{p}{1-p})^t
\end{split}
\tag{6.8}
\end{equation}
Thus, we have g(t)=0 for t=0,1,\cdots,n which yields P(g(T)=0)=1 for all p. Hence, T is a complete statistic.
Example 6.6 (Uniform Complete Sufficient Statistic) Let X_1,\cdots,X_n be i.i.d. Unif(0,\theta) observations. T(\mathbf{X})=\max_iX_i is a sufficient statistic with pdf
\begin{equation}
f(t|\theta)=\left\{\begin{aligned} & nt^{n-1}\theta^{-n} & \quad 0<t<\theta\\ & 0 & \quad otherwise \end{aligned}\right.
\tag{6.9}
\end{equation}
Suppose g(t) is a function satisfies E_{\theta}g(T)=0 for all \theta. Then since E_{\theta}g(T) is constant 0 as a function of \theta, its derivatives w.r.t. \theta is 0. Thus,
\begin{equation}
\begin{split}
0=\frac{d}{d\theta}E_{\theta}g(T)&=\frac{d}{d\theta}\int_0^{\theta}g(t)nt^{n-1}\theta^{-n}dt\\
&=(\theta^{-n})\frac{d}{d\theta}\int_0^{\theta}g(t)nt^{n-1}dt+(\frac{d}{d\theta}\theta^{-n})\int_0^{\theta}g(t)nt^{n-1}dt\\
&=\theta^{-n}ng(\theta)\theta^{n-1}+0==\theta^{-1}ng(\theta)
\end{split}
\tag{6.10}
\end{equation}
The second term of the second line is 0 becasue the integral is, except for a constant, equal to E_{\theta}g(T)=0. Since \theta^{-1}n\neq0, it must be g(\theta)=0 for every \theta>0. Since T(\mathbf{X})=\max_iX_i>0 almost surely, g(T)=0 almost surely. Hence, T is a complete statistic.
Theorem 6.1 (Basu Theorem) If T(\mathbf{X}) is a complete and minimal sufficient statistic, then T(\mathbf{X}) is independent of every ancillary statistic.
Proof. This proof is only for discrete distributions.
Let
S(\mathbf{X}) be any ancillary statistic. Then
P(S(\mathbf{X})=s) does not depend on
\theta. Also the conditional probability
\begin{equation}
P(S(\mathbf{X})=s|T(\mathbf{X})=t)=P(\mathbf{X}\in\{\mathbf{x}:S(\mathbf{X})=s\}|T(\mathbf{X})=t)
\tag{6.11}
\end{equation}
does not depend on
\theta because
T(\mathbf{X})=t is a sufficient statistic. Thus, to show
S(\mathbf{X}) and
T(\mathbf{X}) are independent, it is sufficient to show
\begin{equation}
P(S(\mathbf{X})=s|T(\mathbf{X})=t)=P(S(\mathbf{X})=s)
\tag{6.12}
\end{equation}
for all
t\in\mathcal{T}. Now
\begin{equation}
P(S(\mathbf{X})=s)=\sum_{t\in\mathcal{T}}P(S(\mathbf{X})=s|T(\mathbf{X})=t)P_{\theta}(T(\mathbf{X})=t)
\tag{6.13}
\end{equation}
Furthermore, since
\sum_{t\in\mathcal{T}}P_{\theta}(T(\mathbf{X})=t)=1, we have
\begin{equation}
P(S(\mathbf{X})=s)=P(S(\mathbf{X})=s)\sum_{t\in\mathcal{T}}P_{\theta}(T(\mathbf{X})=t)=\sum_{t\in\mathcal{T}}=P(S(\mathbf{X})=s)P_{\theta}(T(\mathbf{X})=t)
\tag{6.14}
\end{equation}
Therefore, if we define the statistic
\begin{equation}
g(t)=P(S(\mathbf{X})=s|T(\mathbf{X})=t)-P(S(\mathbf{X})=s)
\tag{6.15}
\end{equation}
then from
(6.13) and
(6.14) we have
\begin{equation}
E_{\theta}g(T)=\sum_{t\in\mathcal{T}}g(t)P_{\theta}(T(\mathbf{X})=t)=0,\forall \theta
\tag{6.15}
\end{equation}
Since
T(\mathbf{X}) is a complete statistic, this implies that
g(t)=0 for all possible values of
t\in\mathcal{T}. Hence
(6.12) is verified.
Theorem 6.2 (Complete Statistics in Exponential Family) Let X_1,\cdots,X_n be i.i.d. observations from an exponential family with pdf or pmf of the form
\begin{equation}
f(x|\boldsymbol{\theta})=h(x)c(\boldsymbol{\theta})exp(\sum_{j=1}^k\omega(\theta_j)t_j(x))
\tag{6.16}
\end{equation}
where \boldsymbol{\theta}=(\theta_1,\cdots,\theta_k). Then the statistic T(\mathbf{X})=(\sum_{i=1}^nt_1(X_i),\cdots,\sum_{i=1}^nt_k(X_i)) is complete as long as the parameter space \Theta contains an open set in \mathbb{R}^k.
Example 6.7 (Using Basu Theorem-I) Let
X_1,\cdots,X_n be i.i.d. exponential observations with parameter
\theta. Consider computing the expected value of
g(\mathbf{X})=\frac{X_n}{X_1+\cdots+X_n}. By Example
6.3,
g(\mathbf{X}) is an ancillary statistic. The exponential distribution is also from exponential family with
t(x)=x, so by Theorem
6.2,
T(\mathbf{X})=\sum_{i=1}^nX_i is a complete statistic and by Theorem
5.2 and
5.3 T(\mathbf{X}) is also a minimal sufficient statistic. Hence, by Basu Theorem,
T(\mathbf{X}) and
g(\mathbf{X}) are independent. Thus, we have
\begin{equation}
\theta=E_{\theta}X_n=E_{\theta}T(\mathbf{X})g(\mathbf{X})=(E_{\theta}T(\mathbf{X}))(E_{\theta}g(\mathbf{X}))=n\theta E_{\theta}g(\mathbf{X})
\tag{6.17}
\end{equation}
Hence, for any
\theta,
E_{\theta}g(\mathbf{X})=n^{-1}.
Minimality of the sufficient statistic was not used in the proof of Basu Theorem. The theorem is true with this word omitted becasue a fundamental property of a complete statistic is that it is minimal.
Theorem 6.3 If a minimal sufficient statistic exists, then any complete statistic is also a minimal sufficient statistic.