3.5 Minimal sufficient statistics

Intuitively, a minimal sufficient statistic for parameter $\theta$ is the one that collects the useful information in the sample about $\theta$ but only the essential one, excluding any superfluous information on the sample that does not help on the estimation of $\theta.$

Observe that, if $T$ is a sufficient statistic and $T'=\varphi(T)$ is also a sufficient statistic, being $\varphi$ a non-injective mapping³⁸, then $T'$ condenses more the information. That is, the information in $T$ can not be obtained from that one in $T'$ because $\varphi$ can not be inverted, yet still both collect the sufficient amount of information. In this regard, $\varphi$ acts as a one-way compressor of information.

A minimal sufficient statistic is a sufficient statistic that can be obtained by means of (not necessarily injective but measurable) functions of any other sufficient statistic.

Definition 3.10 (Minimal sufficient statistic) A sufficient statistic $T$ for $\theta$ is minimal sufficient if, for any other sufficient statistic $\tilde{T},$ there exists a measurable function $\varphi$ such that

$\begin{align*} T=\varphi(\tilde{T}). \end{align*}$

The factorization criterion of Theorem 3.5 provides an effective way of obtaining sufficient statistics that usually happen to be minimal. A guarantee of minimality is given by the next theorem.

Theorem 3.6 (Sufficient condition for minimal sufficiency) A statistic $T$ is minimal sufficient for $\theta$ if the following property holds:

$\begin{equation} \begin{split} \frac{\mathcal{L}(\theta;x_1,\ldots,x_n)}{\mathcal{L}(\theta;x_1',\ldots,x_n')}\ &\text{is independent of $\theta$} \\ &\iff T(x_1,\ldots,x_n)=T(x_1',\ldots,x_n') \end{split} \tag{3.5} \end{equation}$

for any sample realizations $(x_1,\ldots,x_n)$ and $(x_1',\ldots,x_n').$

Proof (Proof of Theorem 3.6). We prove the theorem for discrete rv’s. Let $T$ be a statistic that satisfies (3.5). Let us see that, then, it is minimal sufficient.

Firstly, we check that $T$ is sufficient. Indeed, for any sample $(x_1',\ldots,x_n'),$ we have that

$\begin{align*} \mathbb{P}(X_1=x_1',\ldots,X_n=x_n'|T=t)=\begin{cases} 0 & \text{if} \ T(x_1',\ldots,x_n')\neq t,\\ {\frac{\mathbb{P}(X_1=x_1',\ldots,X_n=x_n';\theta)}{\mathbb{P}(T=t;\theta)}} & \text{if} \ T(x_1',\ldots,x_n')= t. \end{cases} \end{align*}$

If we have a sample $(x_1',\ldots,x_n')$ such that $T(x_1',\ldots,x_n')=t,$ then

$\begin{align*} \mathbb{P}(X_1=x_1',\ldots,X_n=x_n'|T=t)&=\frac{\mathbb{P}(X_1=x_1',\ldots,X_n=x_n';\theta)}{\mathbb{P}(T=t;\theta)}\\ &=\frac{\mathbb{P}(X_1=x_1',\ldots,X_n=x_n';\theta)}{\sum_{(x_1,\ldots,x_n)\in A_t} \mathbb{P}(X_1=x_1,\ldots,X_n=x_n;\theta)}, \end{align*}$

where

$\begin{align*} A_t=\{(x_1,\ldots,x_n)\in\mathbb{R}^n:T(x_1,\ldots,x_n)=t\}. \end{align*}$

Rewriting in terms of the likelihood:

$\begin{align*} \mathbb{P}(X_1=x_1',\ldots,X_n=x_n'|T=t)&=\frac{\mathcal{L}(\theta;x_1',\ldots,x_n')}{{\sum_{(x_1,\ldots,x_n)\in A_t}} \mathcal{L}(\theta;x_1,\ldots,x_n)}\\ &=\frac{1}{{\sum_{(x_1,\ldots,x_n)\in A_t}}{\frac{\mathcal{L}(\theta;x_1,\ldots,x_n)}{\mathcal{L}(\theta;x_1',\ldots,x_n')}}}. \end{align*}$

All the samples $(x_1,\ldots,x_n)\in A_t$ share the same value of the statistic, $T(x_1,\ldots,x_n)=t,$ just like $(x_1',\ldots,x_n').$ Therefore, the ratio of likelihoods in the denominator does not depend on $\theta$ because of (3.5). Thus, $T$ is sufficient.

We now check minimal sufficiency. Let $\tilde{T}$ be another sufficient statistic. Let us see that then it has to be $T=\varphi(\tilde{T}).$ Let $(x_1,\ldots,x_n)$ and $(x_1',\ldots,x_n')$ be two samples with the same value for the new sufficient statistic:

$\begin{align*} \tilde{T}(x_1,\ldots,x_n)=\tilde{T}(x_1',\ldots,x_n')=:\tilde{t}. \end{align*}$

Then, the probabilities of such samples given $\tilde{T}=\tilde{t}$ are

$\begin{align*} \mathbb{P}(X_1=x_1,\ldots,X_n=x_n|\tilde{T}=\tilde{t})&=\frac{\mathbb{P}(X_1=x_1,\ldots,X_n=x_n;\theta)}{\mathbb{P}(\tilde{T}=\tilde{t};\theta)},\\ \mathbb{P}(X_1=x_1',\ldots,X_n=x_n'|\tilde{T}=\tilde{t})&=\frac{\mathbb{P}(X_1=x_1',\ldots,X_n=x_n';\theta)}{\mathbb{P}(\tilde{T}=\tilde{t};\theta)}. \end{align*}$

Both are independent of $\theta,$ so the ratio

$\begin{align*} \frac{\mathbb{P}(X_1=x_1,\ldots,X_n=x_n;\theta)}{\mathbb{P}(X_1=x_1',\ldots,X_n=x_n';\theta)}=\frac{\mathcal{L}(\theta;x_1,\ldots,x_n)}{\mathcal{L}(\theta;x_1',\ldots,x_n')} \end{align*}$

is also independent of $\theta.$ By (3.5), it follows that

$\begin{align*} T(x_1,\ldots,x_n)=T(x_1',\ldots,x_n'). \end{align*}$

We have obtained that all the samples that share the same value of $\tilde{T}$ also share the same value of $T,$ that is, for each value $\tilde{t}$ of $\tilde{T},$ there exists a unique value $\varphi(\tilde{t}),$ and therefore $T=\varphi(\tilde{T}).$ This means that $T$ is minimal sufficient.

Example 3.24 Let us find a minimal sufficient statistic for $p$ in Example 3.19.

The ratio of likelihoods is

$\begin{align*} \frac{\mathcal{L}(p;x_1,\ldots,x_n)}{\mathcal{L}(p;x_1',\ldots,x_n')}&=\frac{p^{\sum_{i=1}^n x_i}(1-p)^{n-\sum_{i=1}^n x_i}}{p^{\sum_{i=1}^n x_i'}(1-p)^{n-\sum_{i=1}^n x_i'}}\\ &=\frac{(1-p)^n \left(\frac{p}{1-p}\right)^{\sum_{i=1}^n x_i}}{(1-p)^n \left(\frac{p}{1-p}\right)^{\sum_{i=1}^n x_i'}}\\ &=\left(\frac{p}{1-p}\right)^{\sum_{i=1}^n x_i-\sum_{i=1}^n x_i'}. \end{align*}$

The ratio is independent of $p$ if and only if $\sum_{i=1}^n x_i=\sum_{i=1}^n x_i'.$ Therefore, $T=\sum_{i=1}^n X_i$ is minimal sufficient for $p.$

The exponential family is a family of probability distributions sharing a common structure that gives them excellent properties. In particular, minimal sufficient statistics for parameters of distributions within the exponential family are trivial to obtain!

Definition 3.11 (Exponential family) A rv $X$ belongs to the (univariate) exponential family with parameter $\theta$ if its pmf or pdf, denoted by $f(\cdot;\theta),$ can be expressed as

$\begin{align} f(x;\theta)=c(\theta)h(x)\exp\{w(\theta)t(x)\}, \tag{3.6} \end{align}$

where $c,w:\Theta\rightarrow\mathbb{R}$ and $h,t:\mathbb{R}\rightarrow\mathbb{R}.$

Example 3.25 Let us check that a rv $X\sim \mathrm{Bin}(n,\theta)$ belongs to the exponential family.

Writing the pmf of the binomial as

$\begin{align*} p(x;\theta) &=\binom{n}{x} \theta^x (1-\theta)^{n-x}=(1-\theta)^n\binom{n}{x} \left(\frac{\theta}{1-\theta}\right)^x \\ &=(1-\theta)^n\binom{n}{x} \exp\left\{x\log\left(\frac{\theta}{1-\theta}\right)\right\} \end{align*}$

we can see that it has the shape of the exponential family.

Example 3.26 Let us check that a rv $X\sim \Gamma(\theta,3)$ belongs to the exponential family.

Again, writing the pdf of a gamma as

$\begin{align*} f(x;\theta)=\frac{1}{\Gamma(\theta)3^{\theta}}x^{\theta-1}e^{-x/3}=\frac{1}{\Gamma(\theta)3^{\theta}} e^{-x/3} \exp\{(\theta-1)\log x\} \end{align*}$

it readily follows that it belongs to the exponential family.

Example 3.27 Let us see that a rv $X\sim \mathcal{U}(0,\theta)$ does not belong to the exponential family.

The pdf

$\begin{align*} f(x;\theta)=\begin{cases} 1/\theta & \text{if} \ x\in(0,\theta),\\ 0 & \text{if} \ x\notin (0,\theta) \end{cases} \end{align*}$

can be expressed as

$\begin{align*} f(x;\theta)=\frac{1}{\theta}1_{\{x\in(0,\theta)\}}. \end{align*}$

Since the indicator is a function of $x$ and $\theta$ at the same time, and it is impossible to express it in terms of an exponential function, we conclude that $X$ does not belong to the exponential family.

Theorem 3.7 (Minimal sufficient statistics in the exponential family) In a distribution within the exponential family (3.6) with parameter $\theta$ the statistic

$\begin{align*} T(X_1,\ldots,X_n)=\sum_{i=1}^n t(X_i) \end{align*}$

is minimal sufficient for $\theta.$

Proof (Proof of Theorem 3.7). First, we prove that $T(X_1,\ldots,X_n)=\sum_{i=1}^n t(X_i)$ is sufficient. The likelihood function is given by

$\begin{align*} \mathcal{L}(\theta;x_1,\ldots,x_n)=[c(\theta)]^n \prod_{i=1}^n h(x_i)\exp\left\{w(\theta)\sum_{i=1}^n t(x_i)\right\}. \end{align*}$

Applying Theorem 3.5, we have that

$\begin{align*} h(x_1,\ldots,x_n)=\prod_{i=1}^n h(x_i), \quad g(t,\theta)=[c(\theta)]^n\exp\left\{w(\theta)\sum_{i=1}^n t(x_i)\right\}, \end{align*}$

and we can see that $g(t,\theta)$ depends on the sample through $\sum_{i=1}^n t(x_i).$ Therefore, $T=\sum_{i=1}^n t(X_i)$ is sufficient for $\theta.$

To check that it is minimal sufficient, we apply Theorem 3.6:

$\begin{align*} \frac{\mathcal{L}(\theta;x_1,\ldots,x_n)}{\mathcal{L}(\theta;x_1',\ldots,x_n')}&=\frac{[c(\theta)]^n \prod_{i=1}^n h(x_i)\exp\{w(\theta)\sum_{i=1}^n t(x_i)\}}{[c(\theta)]^n \prod_{i=1}^n h(x_i')\exp\{w(\theta)\sum_{i=1}^n t(x_i')\}} \\ & =\exp\left\{w(\theta)\left[T(x_1,\ldots,x_n)-T(x_1',\ldots,x_n')\right]\right\}\prod_{i=1}^n\frac{h(x_i)}{h(x_i')}. \end{align*}$

The ratio is independent of $\theta$ if and only if

$\begin{align*} T(x_1,\ldots,x_n)=T(x_1',\ldots,x_n'). \end{align*}$

Example 3.28 A minimal sufficient statistic for $\theta$ in Example 3.25 is

$\begin{align*} T=\sum_{i=1}^n X_i. \end{align*}$

Example 3.29 A minimal sufficient statistic for $\theta$ in Example 3.26 is

$\begin{align*} T=\sum_{i=1}^n \log X_i. \end{align*}$

In a non-injective mapping, $\varphi(x)=\varphi(y)$ does not imply that $x=y.$ There might be different elements $x$ and $y$ having the same image by $\varphi.$ ↩︎