## 3.5 Minimal sufficient statistics

Intuitively, a minimal sufficient statistic for parameter $$\theta$$ is the one that collects the useful information in the sample about $$\theta$$ but only the essential one, excluding any superfluous information on the sample that does not help on the estimation of $$\theta.$$

Observe that, if $$T$$ is a sufficient statistic and $$T'=\varphi(T)$$ is also a sufficient statistic, being $$\varphi$$ a non-injective mapping38, then $$T'$$ condenses more the information. That is, the information in $$T$$ can not be obtained from that one in $$T'$$ because $$\varphi$$ can not be inverted, yet still both collect the sufficient amount of information. In this regard, $$\varphi$$ acts as a one-way compressor of information.

A minimal sufficient statistic is a sufficient statistic that can be obtained by means of (not necessarily injective but measurable) functions of any other sufficient statistic.

Definition 3.10 (Minimal sufficient statistic) A sufficient statistic $$T$$ for $$\theta$$ is minimal sufficient if, for any other sufficient statistic $$\tilde{T},$$ there exists a measurable function $$\varphi$$ such that

\begin{align*} T=\varphi(\tilde{T}). \end{align*}

The factorization criterion of Theorem 3.5 provides an effective way of obtaining sufficient statistics that usually happen to be minimal. A guarantee of minimality is given by the next theorem.

Theorem 3.6 (Sufficient condition for minimal sufficiency) A statistic $$T$$ is minimal sufficient for $$\theta$$ if the following property holds:

$$$\begin{split} \frac{\mathcal{L}(\theta;x_1,\ldots,x_n)}{\mathcal{L}(\theta;x_1',\ldots,x_n')}\ &\text{is independent of \theta} \\ &\iff T(x_1,\ldots,x_n)=T(x_1',\ldots,x_n') \end{split} \tag{3.5}$$$

for any sample realizations $$(x_1,\ldots,x_n)$$ and $$(x_1',\ldots,x_n').$$

Proof (Proof of Theorem 3.6). We prove the theorem for discrete rv’s. Let $$T$$ be a statistic that satisfies (3.5). Let us see that, then, it is minimal sufficient.

Firstly, we check that $$T$$ is sufficient. Indeed, for any sample $$(x_1',\ldots,x_n'),$$ we have that

\begin{align*} \mathbb{P}(X_1=x_1',\ldots,X_n=x_n'|T=t)=\begin{cases} 0 & \text{if} \ T(x_1',\ldots,x_n')\neq t,\\ {\frac{\mathbb{P}(X_1=x_1',\ldots,X_n=x_n';\theta)}{\mathbb{P}(T=t;\theta)}} & \text{if} \ T(x_1',\ldots,x_n')= t. \end{cases} \end{align*}

If we have a sample $$(x_1',\ldots,x_n')$$ such that $$T(x_1',\ldots,x_n')=t,$$ then

\begin{align*} \mathbb{P}(X_1=x_1',\ldots,X_n=x_n'|T=t)&=\frac{\mathbb{P}(X_1=x_1',\ldots,X_n=x_n';\theta)}{\mathbb{P}(T=t;\theta)}\\ &=\frac{\mathbb{P}(X_1=x_1',\ldots,X_n=x_n';\theta)}{\sum_{(x_1,\ldots,x_n)\in A_t} \mathbb{P}(X_1=x_1,\ldots,X_n=x_n;\theta)}, \end{align*}

where

\begin{align*} A_t=\{(x_1,\ldots,x_n)\in\mathbb{R}^n:T(x_1,\ldots,x_n)=t\}. \end{align*}

Rewriting in terms of the likelihood:

\begin{align*} \mathbb{P}(X_1=x_1',\ldots,X_n=x_n'|T=t)&=\frac{\mathcal{L}(\theta;x_1',\ldots,x_n')}{{\sum_{(x_1,\ldots,x_n)\in A_t}} \mathcal{L}(\theta;x_1,\ldots,x_n)}\\ &=\frac{1}{{\sum_{(x_1,\ldots,x_n)\in A_t}}{\frac{\mathcal{L}(\theta;x_1,\ldots,x_n)}{\mathcal{L}(\theta;x_1',\ldots,x_n')}}}. \end{align*}

All the samples $$(x_1,\ldots,x_n)\in A_t$$ share the same value of the statistic, $$T(x_1,\ldots,x_n)=t,$$ just like $$(x_1',\ldots,x_n').$$ Therefore, the ratio of likelihoods in the denominator does not depend on $$\theta$$ because of (3.5). Thus, $$T$$ is sufficient.

We now check minimal sufficiency. Let $$\tilde{T}$$ be another sufficient statistic. Let us see that then it has to be $$T=\varphi(\tilde{T}).$$ Let $$(x_1,\ldots,x_n)$$ and $$(x_1',\ldots,x_n')$$ be two samples with the same value for the new sufficient statistic:

\begin{align*} \tilde{T}(x_1,\ldots,x_n)=\tilde{T}(x_1',\ldots,x_n')=:\tilde{t}. \end{align*}

Then, the probabilities of such samples given $$\tilde{T}=\tilde{t}$$ are

\begin{align*} \mathbb{P}(X_1=x_1,\ldots,X_n=x_n|\tilde{T}=\tilde{t})&=\frac{\mathbb{P}(X_1=x_1,\ldots,X_n=x_n;\theta)}{\mathbb{P}(\tilde{T}=\tilde{t};\theta)},\\ \mathbb{P}(X_1=x_1',\ldots,X_n=x_n'|\tilde{T}=\tilde{t})&=\frac{\mathbb{P}(X_1=x_1',\ldots,X_n=x_n';\theta)}{\mathbb{P}(\tilde{T}=\tilde{t};\theta)}. \end{align*}

Both are independent of $$\theta,$$ so the ratio

\begin{align*} \frac{\mathbb{P}(X_1=x_1,\ldots,X_n=x_n;\theta)}{\mathbb{P}(X_1=x_1',\ldots,X_n=x_n';\theta)}=\frac{\mathcal{L}(\theta;x_1,\ldots,x_n)}{\mathcal{L}(\theta;x_1',\ldots,x_n')} \end{align*}

is also independent of $$\theta.$$ By (3.5), it follows that

\begin{align*} T(x_1,\ldots,x_n)=T(x_1',\ldots,x_n'). \end{align*}

We have obtained that all the samples that share the same value of $$\tilde{T}$$ also share the same value of $$T,$$ that is, for each value $$\tilde{t}$$ of $$\tilde{T},$$ there exists a unique value $$\varphi(\tilde{t}),$$ and therefore $$T=\varphi(\tilde{T}).$$ This means that $$T$$ is minimal sufficient.

Example 3.24 Let us find a minimal sufficient statistic for $$p$$ in Example 3.19.

The ratio of likelihoods is

\begin{align*} \frac{\mathcal{L}(p;x_1,\ldots,x_n)}{\mathcal{L}(p;x_1',\ldots,x_n')}&=\frac{p^{\sum_{i=1}^n x_i}(1-p)^{n-\sum_{i=1}^n x_i}}{p^{\sum_{i=1}^n x_i'}(1-p)^{n-\sum_{i=1}^n x_i'}}\\ &=\frac{(1-p)^n \left(\frac{p}{1-p}\right)^{\sum_{i=1}^n x_i}}{(1-p)^n \left(\frac{p}{1-p}\right)^{\sum_{i=1}^n x_i'}}\\ &=\left(\frac{p}{1-p}\right)^{\sum_{i=1}^n x_i-\sum_{i=1}^n x_i'}. \end{align*}

The ratio is independent of $$p$$ if and only if $$\sum_{i=1}^n x_i=\sum_{i=1}^n x_i'.$$ Therefore, $$T=\sum_{i=1}^n X_i$$ is minimal sufficient for $$p.$$

The exponential family is a family of probability distributions sharing a common structure that gives them excellent properties. In particular, minimal sufficient statistics for parameters of distributions within the exponential family are trivial to obtain!

Definition 3.11 (Exponential family) A rv $$X$$ belongs to the (univariate) exponential family with parameter $$\theta$$ if its pmf or pdf, denoted by $$f(\cdot;\theta),$$ can be expressed as

\begin{align} f(x;\theta)=c(\theta)h(x)\exp\{w(\theta)t(x)\}, \tag{3.6} \end{align}

where $$c,w:\Theta\rightarrow\mathbb{R}$$ and $$h,t:\mathbb{R}\rightarrow\mathbb{R}.$$

Example 3.25 Let us check that a rv $$X\sim \mathrm{Bin}(n,\theta)$$ belongs to the exponential family.

Writing the pmf of the binomial as

\begin{align*} p(x;\theta) &=\binom{n}{x} \theta^x (1-\theta)^{n-x}=(1-\theta)^n\binom{n}{x} \left(\frac{\theta}{1-\theta}\right)^x \\ &=(1-\theta)^n\binom{n}{x} \exp\left\{x\log\left(\frac{\theta}{1-\theta}\right)\right\} \end{align*}

we can see that it has the shape of the exponential family.

Example 3.26 Let us check that a rv $$X\sim \Gamma(\theta,3)$$ belongs to the exponential family.

Again, writing the pdf of a gamma as

\begin{align*} f(x;\theta)=\frac{1}{\Gamma(\theta)3^{\theta}}x^{\theta-1}e^{-x/3}=\frac{1}{\Gamma(\theta)3^{\theta}} e^{-x/3} \exp\{(\theta-1)\log x\} \end{align*}

it readily follows that it belongs to the exponential family.

Example 3.27 Let us see that a rv $$X\sim \mathcal{U}(0,\theta)$$ does not belong to the exponential family.

The pdf

\begin{align*} f(x;\theta)=\begin{cases} 1/\theta & \text{if} \ x\in(0,\theta),\\ 0 & \text{if} \ x\notin (0,\theta) \end{cases} \end{align*}

can be expressed as

\begin{align*} f(x;\theta)=\frac{1}{\theta}1_{\{x\in(0,\theta)\}}. \end{align*}

Since the indicator is a function of $$x$$ and $$\theta$$ at the same time, and it is impossible to express it in terms of an exponential function, we conclude that $$X$$ does not belong to the exponential family.

Theorem 3.7 (Minimal sufficient statistics in the exponential family) In a distribution within the exponential family (3.6) with parameter $$\theta$$ the statistic

\begin{align*} T(X_1,\ldots,X_n)=\sum_{i=1}^n t(X_i) \end{align*}

is minimal sufficient for $$\theta.$$

Proof (Proof of Theorem 3.7). First, we prove that $$T(X_1,\ldots,X_n)=\sum_{i=1}^n t(X_i)$$ is sufficient. The likelihood function is given by

\begin{align*} \mathcal{L}(\theta;x_1,\ldots,x_n)=[c(\theta)]^n \prod_{i=1}^n h(x_i)\exp\left\{w(\theta)\sum_{i=1}^n t(x_i)\right\}. \end{align*}

Applying Theorem 3.5, we have that

\begin{align*} h(x_1,\ldots,x_n)=\prod_{i=1}^n h(x_i), \quad g(t,\theta)=[c(\theta)]^n\exp\left\{w(\theta)\sum_{i=1}^n t(x_i)\right\}, \end{align*}

and we can see that $$g(t,\theta)$$ depends on the sample through $$\sum_{i=1}^n t(x_i).$$ Therefore, $$T=\sum_{i=1}^n t(X_i)$$ is sufficient for $$\theta.$$

To check that it is minimal sufficient, we apply Theorem 3.6:

\begin{align*} \frac{\mathcal{L}(\theta;x_1,\ldots,x_n)}{\mathcal{L}(\theta;x_1',\ldots,x_n')}&=\frac{[c(\theta)]^n \prod_{i=1}^n h(x_i)\exp\{w(\theta)\sum_{i=1}^n t(x_i)\}}{[c(\theta)]^n \prod_{i=1}^n h(x_i')\exp\{w(\theta)\sum_{i=1}^n t(x_i')\}} \\ & =\exp\left\{w(\theta)\left[T(x_1,\ldots,x_n)-T(x_1',\ldots,x_n')\right]\right\}\prod_{i=1}^n\frac{h(x_i)}{h(x_i')}. \end{align*}

The ratio is independent of $$\theta$$ if and only if

\begin{align*} T(x_1,\ldots,x_n)=T(x_1',\ldots,x_n'). \end{align*}

Example 3.28 A minimal sufficient statistic for $$\theta$$ in Example 3.25 is

\begin{align*} T=\sum_{i=1}^n X_i. \end{align*}

Example 3.29 A minimal sufficient statistic for $$\theta$$ in Example 3.26 is

\begin{align*} T=\sum_{i=1}^n \log X_i. \end{align*}

1. In a non-injective mapping, $$\varphi(x)=\varphi(y)$$ does not imply that $$x=y.$$ There might be different elements $$x$$ and $$y$$ having the same image by $$\varphi.$$↩︎