3.5 Minimal sufficient statistics

Intuitively, a minimal sufficient statistic for parameter \(\theta\) is the one that collects the useful information in the sample about \(\theta\) but only the essential one, excluding any superfluous information on the sample that does not help on the estimation of \(\theta.\)

Observe that, if \(T\) is a sufficient statistic and \(T'=\varphi(T)\) is also a sufficient statistic, being \(\varphi\) a non-injective mapping38, then \(T'\) condenses more the information. That is, the information in \(T\) can not be obtained from that one in \(T'\) because \(\varphi\) can not be inverted, yet still both collect the sufficient amount of information. In this regard, \(\varphi\) acts as a one-way compressor of information.

A minimal sufficient statistic is a sufficient statistic that can be obtained by means of (not necessarily injective but measurable) functions of any other sufficient statistic.

Definition 3.10 (Minimal sufficient statistic) A sufficient statistic \(T\) for \(\theta\) is minimal sufficient if, for any other sufficient statistic \(\tilde{T},\) there exists a measurable function \(\varphi\) such that

\[\begin{align*} T=\varphi(\tilde{T}). \end{align*}\]

The factorization criterion of Theorem 3.5 provides an effective way of obtaining sufficient statistics that usually happen to be minimal. A guarantee of minimality is given by the next theorem.

Theorem 3.6 (Sufficient condition for minimal sufficiency) A statistic \(T\) is minimal sufficient for \(\theta\) if the following property holds:

\[\begin{equation} \begin{split} \frac{\mathcal{L}(\theta;x_1,\ldots,x_n)}{\mathcal{L}(\theta;x_1',\ldots,x_n')}\ &\text{is independent of $\theta$} \\ &\iff T(x_1,\ldots,x_n)=T(x_1',\ldots,x_n') \end{split} \tag{3.5} \end{equation}\]

for any sample realizations \((x_1,\ldots,x_n)\) and \((x_1',\ldots,x_n').\)

Proof (Proof of Theorem 3.6). We prove the theorem for discrete rv’s. Let \(T\) be a statistic that satisfies (3.5). Let us see that, then, it is minimal sufficient.

Firstly, we check that \(T\) is sufficient. Indeed, for any sample \((x_1',\ldots,x_n'),\) we have that

\[\begin{align*} \mathbb{P}(X_1=x_1',\ldots,X_n=x_n'|T=t)=\begin{cases} 0 & \text{if} \ T(x_1',\ldots,x_n')\neq t,\\ {\frac{\mathbb{P}(X_1=x_1',\ldots,X_n=x_n';\theta)}{\mathbb{P}(T=t;\theta)}} & \text{if} \ T(x_1',\ldots,x_n')= t. \end{cases} \end{align*}\]

If we have a sample \((x_1',\ldots,x_n')\) such that \(T(x_1',\ldots,x_n')=t,\) then

\[\begin{align*} \mathbb{P}(X_1=x_1',\ldots,X_n=x_n'|T=t)&=\frac{\mathbb{P}(X_1=x_1',\ldots,X_n=x_n';\theta)}{\mathbb{P}(T=t;\theta)}\\ &=\frac{\mathbb{P}(X_1=x_1',\ldots,X_n=x_n';\theta)}{\sum_{(x_1,\ldots,x_n)\in A_t} \mathbb{P}(X_1=x_1,\ldots,X_n=x_n;\theta)}, \end{align*}\]

where

\[\begin{align*} A_t=\{(x_1,\ldots,x_n)\in\mathbb{R}^n:T(x_1,\ldots,x_n)=t\}. \end{align*}\]

Rewriting in terms of the likelihood:

\[\begin{align*} \mathbb{P}(X_1=x_1',\ldots,X_n=x_n'|T=t)&=\frac{\mathcal{L}(\theta;x_1',\ldots,x_n')}{{\sum_{(x_1,\ldots,x_n)\in A_t}} \mathcal{L}(\theta;x_1,\ldots,x_n)}\\ &=\frac{1}{{\sum_{(x_1,\ldots,x_n)\in A_t}}{\frac{\mathcal{L}(\theta;x_1,\ldots,x_n)}{\mathcal{L}(\theta;x_1',\ldots,x_n')}}}. \end{align*}\]

All the samples \((x_1,\ldots,x_n)\in A_t\) share the same value of the statistic, \(T(x_1,\ldots,x_n)=t,\) just like \((x_1',\ldots,x_n').\) Therefore, the ratio of likelihoods in the denominator does not depend on \(\theta\) because of (3.5). Thus, \(T\) is sufficient.

We now check minimal sufficiency. Let \(\tilde{T}\) be another sufficient statistic. Let us see that then it has to be \(T=\varphi(\tilde{T}).\) Let \((x_1,\ldots,x_n)\) and \((x_1',\ldots,x_n')\) be two samples with the same value for the new sufficient statistic:

\[\begin{align*} \tilde{T}(x_1,\ldots,x_n)=\tilde{T}(x_1',\ldots,x_n')=:\tilde{t}. \end{align*}\]

Then, the probabilities of such samples given \(\tilde{T}=\tilde{t}\) are

\[\begin{align*} \mathbb{P}(X_1=x_1,\ldots,X_n=x_n|\tilde{T}=\tilde{t})&=\frac{\mathbb{P}(X_1=x_1,\ldots,X_n=x_n;\theta)}{\mathbb{P}(\tilde{T}=\tilde{t};\theta)},\\ \mathbb{P}(X_1=x_1',\ldots,X_n=x_n'|\tilde{T}=\tilde{t})&=\frac{\mathbb{P}(X_1=x_1',\ldots,X_n=x_n';\theta)}{\mathbb{P}(\tilde{T}=\tilde{t};\theta)}. \end{align*}\]

Both are independent of \(\theta,\) so the ratio

\[\begin{align*} \frac{\mathbb{P}(X_1=x_1,\ldots,X_n=x_n;\theta)}{\mathbb{P}(X_1=x_1',\ldots,X_n=x_n';\theta)}=\frac{\mathcal{L}(\theta;x_1,\ldots,x_n)}{\mathcal{L}(\theta;x_1',\ldots,x_n')} \end{align*}\]

is also independent of \(\theta.\) By (3.5), it follows that

\[\begin{align*} T(x_1,\ldots,x_n)=T(x_1',\ldots,x_n'). \end{align*}\]

We have obtained that all the samples that share the same value of \(\tilde{T}\) also share the same value of \(T,\) that is, for each value \(\tilde{t}\) of \(\tilde{T},\) there exists a unique value \(\varphi(\tilde{t}),\) and therefore \(T=\varphi(\tilde{T}).\) This means that \(T\) is minimal sufficient.

Example 3.24 Let us find a minimal sufficient statistic for \(p\) in Example 3.19.

The ratio of likelihoods is

\[\begin{align*} \frac{\mathcal{L}(p;x_1,\ldots,x_n)}{\mathcal{L}(p;x_1',\ldots,x_n')}&=\frac{p^{\sum_{i=1}^n x_i}(1-p)^{n-\sum_{i=1}^n x_i}}{p^{\sum_{i=1}^n x_i'}(1-p)^{n-\sum_{i=1}^n x_i'}}\\ &=\frac{(1-p)^n \left(\frac{p}{1-p}\right)^{\sum_{i=1}^n x_i}}{(1-p)^n \left(\frac{p}{1-p}\right)^{\sum_{i=1}^n x_i'}}\\ &=\left(\frac{p}{1-p}\right)^{\sum_{i=1}^n x_i-\sum_{i=1}^n x_i'}. \end{align*}\]

The ratio is independent of \(p\) if and only if \(\sum_{i=1}^n x_i=\sum_{i=1}^n x_i'.\) Therefore, \(T=\sum_{i=1}^n X_i\) is minimal sufficient for \(p.\)

The exponential family is a family of probability distributions sharing a common structure that gives them excellent properties. In particular, minimal sufficient statistics for parameters of distributions within the exponential family are trivial to obtain!

Definition 3.11 (Exponential family) A rv \(X\) belongs to the (univariate) exponential family with parameter \(\theta\) if its pmf or pdf, denoted by \(f(\cdot;\theta),\) can be expressed as

\[\begin{align} f(x;\theta)=c(\theta)h(x)\exp\{w(\theta)t(x)\}, \tag{3.6} \end{align}\]

where \(c,w:\Theta\rightarrow\mathbb{R}\) and \(h,t:\mathbb{R}\rightarrow\mathbb{R}.\)

Example 3.25 Let us check that a rv \(X\sim \mathrm{Bin}(n,\theta)\) belongs to the exponential family.

Writing the pmf of the binomial as

\[\begin{align*} p(x;\theta) &=\binom{n}{x} \theta^x (1-\theta)^{n-x}=(1-\theta)^n\binom{n}{x} \left(\frac{\theta}{1-\theta}\right)^x \\ &=(1-\theta)^n\binom{n}{x} \exp\left\{x\log\left(\frac{\theta}{1-\theta}\right)\right\} \end{align*}\]

we can see that it has the shape of the exponential family.

Example 3.26 Let us check that a rv \(X\sim \Gamma(\theta,3)\) belongs to the exponential family.

Again, writing the pdf of a gamma as

\[\begin{align*} f(x;\theta)=\frac{1}{\Gamma(\theta)3^{\theta}}x^{\theta-1}e^{-x/3}=\frac{1}{\Gamma(\theta)3^{\theta}} e^{-x/3} \exp\{(\theta-1)\log x\} \end{align*}\]

it readily follows that it belongs to the exponential family.

Example 3.27 Let us see that a rv \(X\sim \mathcal{U}(0,\theta)\) does not belong to the exponential family.

The pdf

\[\begin{align*} f(x;\theta)=\begin{cases} 1/\theta & \text{if} \ x\in(0,\theta),\\ 0 & \text{if} \ x\notin (0,\theta) \end{cases} \end{align*}\]

can be expressed as

\[\begin{align*} f(x;\theta)=\frac{1}{\theta}1_{\{x\in(0,\theta)\}}. \end{align*}\]

Since the indicator is a function of \(x\) and \(\theta\) at the same time, and it is impossible to express it in terms of an exponential function, we conclude that \(X\) does not belong to the exponential family.

Theorem 3.7 (Minimal sufficient statistics in the exponential family) In a distribution within the exponential family (3.6) with parameter \(\theta\) the statistic

\[\begin{align*} T(X_1,\ldots,X_n)=\sum_{i=1}^n t(X_i) \end{align*}\]

is minimal sufficient for \(\theta.\)

Proof (Proof of Theorem 3.7). First, we prove that \(T(X_1,\ldots,X_n)=\sum_{i=1}^n t(X_i)\) is sufficient. The likelihood function is given by

\[\begin{align*} \mathcal{L}(\theta;x_1,\ldots,x_n)=[c(\theta)]^n \prod_{i=1}^n h(x_i)\exp\left\{w(\theta)\sum_{i=1}^n t(x_i)\right\}. \end{align*}\]

Applying Theorem 3.5, we have that

\[\begin{align*} h(x_1,\ldots,x_n)=\prod_{i=1}^n h(x_i), \quad g(t,\theta)=[c(\theta)]^n\exp\left\{w(\theta)\sum_{i=1}^n t(x_i)\right\}, \end{align*}\]

and we can see that \(g(t,\theta)\) depends on the sample through \(\sum_{i=1}^n t(x_i).\) Therefore, \(T=\sum_{i=1}^n t(X_i)\) is sufficient for \(\theta.\)

To check that it is minimal sufficient, we apply Theorem 3.6:

\[\begin{align*} \frac{\mathcal{L}(\theta;x_1,\ldots,x_n)}{\mathcal{L}(\theta;x_1',\ldots,x_n')}&=\frac{[c(\theta)]^n \prod_{i=1}^n h(x_i)\exp\{w(\theta)\sum_{i=1}^n t(x_i)\}}{[c(\theta)]^n \prod_{i=1}^n h(x_i')\exp\{w(\theta)\sum_{i=1}^n t(x_i')\}} \\ & =\exp\left\{w(\theta)\left[T(x_1,\ldots,x_n)-T(x_1',\ldots,x_n')\right]\right\}\prod_{i=1}^n\frac{h(x_i)}{h(x_i')}. \end{align*}\]

The ratio is independent of \(\theta\) if and only if

\[\begin{align*} T(x_1,\ldots,x_n)=T(x_1',\ldots,x_n'). \end{align*}\]

Example 3.28 A minimal sufficient statistic for \(\theta\) in Example 3.25 is

\[\begin{align*} T=\sum_{i=1}^n X_i. \end{align*}\]

Example 3.29 A minimal sufficient statistic for \(\theta\) in Example 3.26 is

\[\begin{align*} T=\sum_{i=1}^n \log X_i. \end{align*}\]


  1. In a non-injective mapping, \(\varphi(x)=\varphi(y)\) does not imply that \(x=y.\) There might be different elements \(x\) and \(y\) having the same image by \(\varphi.\)↩︎