3.5 Minimal sufficient statistics

Intuitively, a minimal sufficient statistic for parameter θ is the one that collects the useful information in the sample about θ but only the essential one, excluding any superfluous information on the sample that does not help on the estimation of θ.

Observe that, if T is a sufficient statistic and T=φ(T) is also a sufficient statistic, being φ a non-injective mapping38, then T condenses more the information. That is, the information in T can not be obtained from that one in T because φ can not be inverted, yet still both collect the sufficient amount of information. In this regard, φ acts as a one-way compressor of information.

A minimal sufficient statistic is a sufficient statistic that can be obtained by means of (not necessarily injective but measurable) functions of any other sufficient statistic.

Definition 3.10 (Minimal sufficient statistic) A sufficient statistic T for θ is minimal sufficient if, for any other sufficient statistic ˜T, there exists a measurable function φ such that

T=φ(˜T).

The factorization criterion of Theorem 3.5 provides an effective way of obtaining sufficient statistics that usually happen to be minimal. A guarantee of minimality is given by the next theorem.

Theorem 3.6 (Sufficient condition for minimal sufficiency) A statistic T is minimal sufficient for θ if the following property holds:

L(θ;x1,,xn)L(θ;x1,,xn) is independent of θT(x1,,xn)=T(x1,,xn)

for any sample realizations (x1,,xn) and (x1,,xn).

Proof (Proof of Theorem 3.6). We prove the theorem for discrete rv’s. Let T be a statistic that satisfies (3.5). Let us see that, then, it is minimal sufficient.

Firstly, we check that T is sufficient. Indeed, for any sample (x1,,xn), we have that

P(X1=x1,,Xn=xn|T=t)={0if T(x1,,xn)t,P(X1=x1,,Xn=xn;θ)P(T=t;θ)if T(x1,,xn)=t.

If we have a sample (x1,,xn) such that T(x1,,xn)=t, then

P(X1=x1,,Xn=xn|T=t)=P(X1=x1,,Xn=xn;θ)P(T=t;θ)=P(X1=x1,,Xn=xn;θ)(x1,,xn)AtP(X1=x1,,Xn=xn;θ),

where

At={(x1,,xn)Rn:T(x1,,xn)=t}.

Rewriting in terms of the likelihood:

P(X1=x1,,Xn=xn|T=t)=L(θ;x1,,xn)(x1,,xn)AtL(θ;x1,,xn)=1(x1,,xn)AtL(θ;x1,,xn)L(θ;x1,,xn).

All the samples (x1,,xn)At share the same value of the statistic, T(x1,,xn)=t, just like (x1,,xn). Therefore, the ratio of likelihoods in the denominator does not depend on θ because of (3.5). Thus, T is sufficient.

We now check minimal sufficiency. Let ˜T be another sufficient statistic. Let us see that then it has to be T=φ(˜T). Let (x1,,xn) and (x1,,xn) be two samples with the same value for the new sufficient statistic:

˜T(x1,,xn)=˜T(x1,,xn)=:˜t.

Then, the probabilities of such samples given ˜T=˜t are

P(X1=x1,,Xn=xn|˜T=˜t)=P(X1=x1,,Xn=xn;θ)P(˜T=˜t;θ),P(X1=x1,,Xn=xn|˜T=˜t)=P(X1=x1,,Xn=xn;θ)P(˜T=˜t;θ).

Both are independent of θ, so the ratio

P(X1=x1,,Xn=xn;θ)P(X1=x1,,Xn=xn;θ)=L(θ;x1,,xn)L(θ;x1,,xn)

is also independent of θ. By (3.5), it follows that

T(x1,,xn)=T(x1,,xn).

We have obtained that all the samples that share the same value of ˜T also share the same value of T, that is, for each value ˜t of ˜T, there exists a unique value φ(˜t), and therefore T=φ(˜T). This means that T is minimal sufficient.

Example 3.24 Let us find a minimal sufficient statistic for p in Example 3.19.

The ratio of likelihoods is

L(p;x1,,xn)L(p;x1,,xn)=pni=1xi(1p)nni=1xipni=1xi(1p)nni=1xi=(1p)n(p1p)ni=1xi(1p)n(p1p)ni=1xi=(p1p)ni=1xini=1xi.

The ratio is independent of p if and only if ni=1xi=ni=1xi. Therefore, T=ni=1Xi is minimal sufficient for p.

The exponential family is a family of probability distributions sharing a common structure that gives them excellent properties. In particular, minimal sufficient statistics for parameters of distributions within the exponential family are trivial to obtain!

Definition 3.11 (Exponential family) A rv X belongs to the (univariate) exponential family with parameter θ if its pmf or pdf, denoted by f(;θ), can be expressed as

f(x;θ)=c(θ)h(x)exp{w(θ)t(x)},

where c,w:ΘR and h,t:RR.

Example 3.25 Let us check that a rv XBin(n,θ) belongs to the exponential family.

Writing the pmf of the binomial as

\begin{align*} p(x;\theta) &=\binom{n}{x} \theta^x (1-\theta)^{n-x}=(1-\theta)^n\binom{n}{x} \left(\frac{\theta}{1-\theta}\right)^x \\ &=(1-\theta)^n\binom{n}{x} \exp\left\{x\log\left(\frac{\theta}{1-\theta}\right)\right\} \end{align*}

we can see that it has the shape of the exponential family.

Example 3.26 Let us check that a rv X\sim \Gamma(\theta,3) belongs to the exponential family.

Again, writing the pdf of a gamma as

\begin{align*} f(x;\theta)=\frac{1}{\Gamma(\theta)3^{\theta}}x^{\theta-1}e^{-x/3}=\frac{1}{\Gamma(\theta)3^{\theta}} e^{-x/3} \exp\{(\theta-1)\log x\} \end{align*}

it readily follows that it belongs to the exponential family.

Example 3.27 Let us see that a rv X\sim \mathcal{U}(0,\theta) does not belong to the exponential family.

The pdf

\begin{align*} f(x;\theta)=\begin{cases} 1/\theta & \text{if} \ x\in(0,\theta),\\ 0 & \text{if} \ x\notin (0,\theta) \end{cases} \end{align*}

can be expressed as

\begin{align*} f(x;\theta)=\frac{1}{\theta}1_{\{x\in(0,\theta)\}}. \end{align*}

Since the indicator is a function of x and \theta at the same time, and it is impossible to express it in terms of an exponential function, we conclude that X does not belong to the exponential family.

Theorem 3.7 (Minimal sufficient statistics in the exponential family) In a distribution within the exponential family (3.6) with parameter \theta the statistic

\begin{align*} T(X_1,\ldots,X_n)=\sum_{i=1}^n t(X_i) \end{align*}

is minimal sufficient for \theta.

Proof (Proof of Theorem 3.7). First, we prove that T(X_1,\ldots,X_n)=\sum_{i=1}^n t(X_i) is sufficient. The likelihood function is given by

\begin{align*} \mathcal{L}(\theta;x_1,\ldots,x_n)=[c(\theta)]^n \prod_{i=1}^n h(x_i)\exp\left\{w(\theta)\sum_{i=1}^n t(x_i)\right\}. \end{align*}

Applying Theorem 3.5, we have that

\begin{align*} h(x_1,\ldots,x_n)=\prod_{i=1}^n h(x_i), \quad g(t,\theta)=[c(\theta)]^n\exp\left\{w(\theta)\sum_{i=1}^n t(x_i)\right\}, \end{align*}

and we can see that g(t,\theta) depends on the sample through \sum_{i=1}^n t(x_i). Therefore, T=\sum_{i=1}^n t(X_i) is sufficient for \theta.

To check that it is minimal sufficient, we apply Theorem 3.6:

\begin{align*} \frac{\mathcal{L}(\theta;x_1,\ldots,x_n)}{\mathcal{L}(\theta;x_1',\ldots,x_n')}&=\frac{[c(\theta)]^n \prod_{i=1}^n h(x_i)\exp\{w(\theta)\sum_{i=1}^n t(x_i)\}}{[c(\theta)]^n \prod_{i=1}^n h(x_i')\exp\{w(\theta)\sum_{i=1}^n t(x_i')\}} \\ & =\exp\left\{w(\theta)\left[T(x_1,\ldots,x_n)-T(x_1',\ldots,x_n')\right]\right\}\prod_{i=1}^n\frac{h(x_i)}{h(x_i')}. \end{align*}

The ratio is independent of \theta if and only if

\begin{align*} T(x_1,\ldots,x_n)=T(x_1',\ldots,x_n'). \end{align*}

Example 3.28 A minimal sufficient statistic for \theta in Example 3.25 is

\begin{align*} T=\sum_{i=1}^n X_i. \end{align*}

Example 3.29 A minimal sufficient statistic for \theta in Example 3.26 is

\begin{align*} T=\sum_{i=1}^n \log X_i. \end{align*}


  1. In a non-injective mapping, \varphi(x)=\varphi(y) does not imply that x=y. There might be different elements x and y having the same image by \varphi.↩︎