3.5 Minimal sufficient statistics
Intuitively, a minimal sufficient statistic for parameter θ is the one that collects the useful information in the sample about θ but only the essential one, excluding any superfluous information on the sample that does not help on the estimation of θ.
Observe that, if T is a sufficient statistic and T′=φ(T) is also a sufficient statistic, being φ a non-injective mapping38, then T′ condenses more the information. That is, the information in T can not be obtained from that one in T′ because φ can not be inverted, yet still both collect the sufficient amount of information. In this regard, φ acts as a one-way compressor of information.
A minimal sufficient statistic is a sufficient statistic that can be obtained by means of (not necessarily injective but measurable) functions of any other sufficient statistic.
Definition 3.10 (Minimal sufficient statistic) A sufficient statistic T for θ is minimal sufficient if, for any other sufficient statistic ˜T, there exists a measurable function φ such that
T=φ(˜T).
The factorization criterion of Theorem 3.5 provides an effective way of obtaining sufficient statistics that usually happen to be minimal. A guarantee of minimality is given by the next theorem.
Theorem 3.6 (Sufficient condition for minimal sufficiency) A statistic T is minimal sufficient for θ if the following property holds:
L(θ;x1,…,xn)L(θ;x′1,…,x′n) is independent of θ⟺T(x1,…,xn)=T(x′1,…,x′n)
for any sample realizations (x1,…,xn) and (x′1,…,x′n).
Proof (Proof of Theorem 3.6). We prove the theorem for discrete rv’s. Let T be a statistic that satisfies (3.5). Let us see that, then, it is minimal sufficient.
Firstly, we check that T is sufficient. Indeed, for any sample (x′1,…,x′n), we have that
P(X1=x′1,…,Xn=x′n|T=t)={0if T(x′1,…,x′n)≠t,P(X1=x′1,…,Xn=x′n;θ)P(T=t;θ)if T(x′1,…,x′n)=t.
If we have a sample (x′1,…,x′n) such that T(x′1,…,x′n)=t, then
P(X1=x′1,…,Xn=x′n|T=t)=P(X1=x′1,…,Xn=x′n;θ)P(T=t;θ)=P(X1=x′1,…,Xn=x′n;θ)∑(x1,…,xn)∈AtP(X1=x1,…,Xn=xn;θ),
where
At={(x1,…,xn)∈Rn:T(x1,…,xn)=t}.
Rewriting in terms of the likelihood:
P(X1=x′1,…,Xn=x′n|T=t)=L(θ;x′1,…,x′n)∑(x1,…,xn)∈AtL(θ;x1,…,xn)=1∑(x1,…,xn)∈AtL(θ;x1,…,xn)L(θ;x′1,…,x′n).
All the samples (x1,…,xn)∈At share the same value of the statistic, T(x1,…,xn)=t, just like (x′1,…,x′n). Therefore, the ratio of likelihoods in the denominator does not depend on θ because of (3.5). Thus, T is sufficient.
We now check minimal sufficiency. Let ˜T be another sufficient statistic. Let us see that then it has to be T=φ(˜T). Let (x1,…,xn) and (x′1,…,x′n) be two samples with the same value for the new sufficient statistic:
˜T(x1,…,xn)=˜T(x′1,…,x′n)=:˜t.
Then, the probabilities of such samples given ˜T=˜t are
P(X1=x1,…,Xn=xn|˜T=˜t)=P(X1=x1,…,Xn=xn;θ)P(˜T=˜t;θ),P(X1=x′1,…,Xn=x′n|˜T=˜t)=P(X1=x′1,…,Xn=x′n;θ)P(˜T=˜t;θ).
Both are independent of θ, so the ratio
P(X1=x1,…,Xn=xn;θ)P(X1=x′1,…,Xn=x′n;θ)=L(θ;x1,…,xn)L(θ;x′1,…,x′n)
is also independent of θ. By (3.5), it follows that
T(x1,…,xn)=T(x′1,…,x′n).
We have obtained that all the samples that share the same value of ˜T also share the same value of T, that is, for each value ˜t of ˜T, there exists a unique value φ(˜t), and therefore T=φ(˜T). This means that T is minimal sufficient.
Example 3.24 Let us find a minimal sufficient statistic for p in Example 3.19.
The ratio of likelihoods is
L(p;x1,…,xn)L(p;x′1,…,x′n)=p∑ni=1xi(1−p)n−∑ni=1xip∑ni=1x′i(1−p)n−∑ni=1x′i=(1−p)n(p1−p)∑ni=1xi(1−p)n(p1−p)∑ni=1x′i=(p1−p)∑ni=1xi−∑ni=1x′i.
The ratio is independent of p if and only if ∑ni=1xi=∑ni=1x′i. Therefore, T=∑ni=1Xi is minimal sufficient for p.
The exponential family is a family of probability distributions sharing a common structure that gives them excellent properties. In particular, minimal sufficient statistics for parameters of distributions within the exponential family are trivial to obtain!
Definition 3.11 (Exponential family) A rv X belongs to the (univariate) exponential family with parameter θ if its pmf or pdf, denoted by f(⋅;θ), can be expressed as
f(x;θ)=c(θ)h(x)exp{w(θ)t(x)},
where c,w:Θ→R and h,t:R→R.
Example 3.25 Let us check that a rv X∼Bin(n,θ) belongs to the exponential family.
Writing the pmf of the binomial as
\begin{align*} p(x;\theta) &=\binom{n}{x} \theta^x (1-\theta)^{n-x}=(1-\theta)^n\binom{n}{x} \left(\frac{\theta}{1-\theta}\right)^x \\ &=(1-\theta)^n\binom{n}{x} \exp\left\{x\log\left(\frac{\theta}{1-\theta}\right)\right\} \end{align*}
we can see that it has the shape of the exponential family.
Example 3.26 Let us check that a rv X\sim \Gamma(\theta,3) belongs to the exponential family.
Again, writing the pdf of a gamma as
\begin{align*} f(x;\theta)=\frac{1}{\Gamma(\theta)3^{\theta}}x^{\theta-1}e^{-x/3}=\frac{1}{\Gamma(\theta)3^{\theta}} e^{-x/3} \exp\{(\theta-1)\log x\} \end{align*}
it readily follows that it belongs to the exponential family.
Example 3.27 Let us see that a rv X\sim \mathcal{U}(0,\theta) does not belong to the exponential family.
The pdf
\begin{align*} f(x;\theta)=\begin{cases} 1/\theta & \text{if} \ x\in(0,\theta),\\ 0 & \text{if} \ x\notin (0,\theta) \end{cases} \end{align*}
can be expressed as
\begin{align*} f(x;\theta)=\frac{1}{\theta}1_{\{x\in(0,\theta)\}}. \end{align*}
Since the indicator is a function of x and \theta at the same time, and it is impossible to express it in terms of an exponential function, we conclude that X does not belong to the exponential family.
Theorem 3.7 (Minimal sufficient statistics in the exponential family) In a distribution within the exponential family (3.6) with parameter \theta the statistic
\begin{align*} T(X_1,\ldots,X_n)=\sum_{i=1}^n t(X_i) \end{align*}
is minimal sufficient for \theta.
Proof (Proof of Theorem 3.7). First, we prove that T(X_1,\ldots,X_n)=\sum_{i=1}^n t(X_i) is sufficient. The likelihood function is given by
\begin{align*} \mathcal{L}(\theta;x_1,\ldots,x_n)=[c(\theta)]^n \prod_{i=1}^n h(x_i)\exp\left\{w(\theta)\sum_{i=1}^n t(x_i)\right\}. \end{align*}
Applying Theorem 3.5, we have that
\begin{align*} h(x_1,\ldots,x_n)=\prod_{i=1}^n h(x_i), \quad g(t,\theta)=[c(\theta)]^n\exp\left\{w(\theta)\sum_{i=1}^n t(x_i)\right\}, \end{align*}
and we can see that g(t,\theta) depends on the sample through \sum_{i=1}^n t(x_i). Therefore, T=\sum_{i=1}^n t(X_i) is sufficient for \theta.
To check that it is minimal sufficient, we apply Theorem 3.6:
\begin{align*} \frac{\mathcal{L}(\theta;x_1,\ldots,x_n)}{\mathcal{L}(\theta;x_1',\ldots,x_n')}&=\frac{[c(\theta)]^n \prod_{i=1}^n h(x_i)\exp\{w(\theta)\sum_{i=1}^n t(x_i)\}}{[c(\theta)]^n \prod_{i=1}^n h(x_i')\exp\{w(\theta)\sum_{i=1}^n t(x_i')\}} \\ & =\exp\left\{w(\theta)\left[T(x_1,\ldots,x_n)-T(x_1',\ldots,x_n')\right]\right\}\prod_{i=1}^n\frac{h(x_i)}{h(x_i')}. \end{align*}
The ratio is independent of \theta if and only if
\begin{align*} T(x_1,\ldots,x_n)=T(x_1',\ldots,x_n'). \end{align*}
Example 3.28 A minimal sufficient statistic for \theta in Example 3.25 is
\begin{align*} T=\sum_{i=1}^n X_i. \end{align*}
Example 3.29 A minimal sufficient statistic for \theta in Example 3.26 is
\begin{align*} T=\sum_{i=1}^n \log X_i. \end{align*}
In a non-injective mapping, \varphi(x)=\varphi(y) does not imply that x=y. There might be different elements x and y having the same image by \varphi.↩︎