Chapter 8 Homework 2: Principles of Data Reduction: Problems and Solutions

Exercise 8.1 (Casella and Berger 6.1) Let \(X\) be one observation from a \(N(0,\sigma^2)\) population. Is \(|X|\) a sufficient statistic?
Proof. Consider the pdf given by \[\begin{equation} f(x|\sigma^2)=(2\pi\sigma^2)^{-1/2}exp(-\frac{x^2}{2\sigma^2}) =(2\pi\sigma^2)^{-1/2}exp(-\frac{|x|^2}{2\sigma^2}) \tag{8.1} \end{equation}\] Define \(g(t|\sigma^2)=(2\pi\sigma^2)^{-1/2}exp(-\frac{|x|^2}{2\sigma^2})\) and \(h(x)=1\), then by Factorization Theorem, \(|X|\) is a sufficient statistic.

Exercise 8.2 (Casella and Berger 6.2) Let \(X_1,\cdots,X_n\) be independent random variables with densities \[\begin{equation} f_{X_i}(x|\theta)=\left\{\begin{aligned} & e^{i\theta-x} & \quad x\geq i\theta \\ & 0 & \quad x<i\theta \end{aligned} \right. \tag{8.2} \end{equation}\]

Prove that \(T=\min_i(X_i/i)\) is a sufficient statistic for \(\theta\).
Proof. Consider the joint pdf \[\begin{equation} \begin{split} f(\mathbf{x}|\theta)&=\prod_{i=1}^n e^{i\theta-x} I_{x_i\geq i\theta}(x_i)\\ &=exp(\theta\sum_{i=1}^ni-\sum_{i=1}^nx_i)I_{\min_i(x_i/i)\geq\theta}(\mathbf{x})\\ &=exp(\theta\sum_{i=1}^ni)I_{\min_i(x_i/i)\geq\theta}(\mathbf{x})\cdot exp(-\sum_{i=1}^nx_i)\\ &=g(T(\mathbf{x})|\theta)h(\mathbf{x}) \end{split} \tag{8.3} \end{equation}\] Hence, by Factorization Theorem, we have \(T=\min_i(X_i/i)\) is a sufficient statistic for \(\theta\).
Exercise 8.3 (Casella and Berger 6.5) Let \(X_1,\cdots,X_n\) be independent random variables with pdfs \[\begin{equation} f(x_i|\theta)=\left\{\begin{aligned} & \frac{1}{2i\theta} & \quad -i(\theta-1)< x_i<i(\theta+1) \\ & 0 & \quad otherwise \end{aligned} \right. \tag{8.4} \end{equation}\] where \(\theta>0\). Find a two-dimensional sufficient statistic for \(\theta\).
Proof. Consider the joint pdf \[\begin{equation} \begin{split} f(\mathbf{x}|\theta)&=\prod_{i=1}^n \frac{1}{2i\theta} I_{-i(\theta-1)< x_i<i(\theta+1)}(x_i)\\ &=(2\theta)^{-n}I_{\min_i\frac{x_i}{i}>1-\theta}(\mathbf{x})I_{\max_i\frac{x_i}{i}<1+\theta}(\mathbf{x}) \prod_{i=1}^ni^{-1}\\ &=g(T(\mathbf{x})|\theta)h(\mathbf{x}) \end{split} \tag{8.5} \end{equation}\] Hence \(T(\mathbf{X})=(\min_{i}(\frac{X_i}{i}),\max_{i}(\frac{X_i}{i}))\) is a two-dimensional sufficient statistic for \(\theta\).
Exercise 8.4 (Casella and Berger 6.6) Let \(X_1,\cdots,X_n\) be a random sample from a \(Gamma(\alpha,\beta)\) population. Find a two dimensional sufficient statistic for \((\alpha,\beta)\)
Proof. Consider the joint pdf, we have \[\begin{equation} \begin{split} f(\mathbf{x}|\theta)&=\prod_{i=1}^n \frac{\beta^\alpha}{\Gamma(\alpha)} x_i^{\alpha-1}e^{-\beta x_i}\\ &=(\frac{\beta^\alpha}{\Gamma(\alpha)})^n (\prod_{i=1}^n x_i)^{\alpha-1}exp(-\beta(\sum_{i=1}^n x_i))\\ &=g(T(\mathbf{x})|\alpha,\beta)h(\mathbf{x}) \end{split} \tag{8.6} \end{equation}\] Hence \(T(\mathbf{X})=(\prod_{i=1}^n x_i,\sum_{i=1}^n x_i)\) is a two-dimensional sufficient statistic for \(\theta\).

Exercise 8.5 (Casella and Berger 6.9) For each of the following distributions let \(X_1,\cdots,X_n\) be a random sample. Find a minimal sufficient statistic for \(\theta\).

  1. Normal: \(f(x|\theta)=\frac{1}{\sqrt{2\pi}}e^{-(x-\theta)^2/2}\), \(-\infty<x<\infty\), \(-\infty<\theta<\infty\).

  2. Location exponential: \(f(x|\theta)=e^{-(x-\theta)}\), \(\theta<x<\infty\), \(-\infty<\theta<\infty\).

  3. Logistic: \(f(x|\theta)=\frac{e^{-(x-\theta)}}{(1+e^{-(x-\theta)})^2}\), \(-\infty<x<\infty\), \(-\infty<\theta<\infty\).

  4. Cauchy: \(f(x|\theta)=\frac{1}{\pi[1+(x-\theta)^2]}\), \(-\infty<x<\infty\), \(-\infty<\theta<\infty\).

  5. Double exponential: \(f(x|\theta)=\frac{1}{2}e^{-|x-\theta|}\), \(-\infty<x<\infty\), \(-\infty<\theta<\infty\).

Proof. (a) In this case, sample mean is the minimal sufficient statistic for \(\theta\). Because \[\begin{equation} \begin{split} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}&= \frac{exp\{-\frac{1}{2}[\sum_{i=1}^n(x_i-\bar{x})^2+n(\bar{x}-\theta)^2]\}} {exp\{-\frac{1}{2}[\sum_{i=1}^n(y_i-\bar{y})^2+n(\bar{y}-\theta)^2]\}}\\ &=exp\{-\frac{1}{2}[(n-1)(S_x^2+S_y^2)+n(\bar{x}^2-\bar{y}^2)+2n\theta(\bar{x}-\bar{y})]\} \end{split} \tag{8.7} \end{equation}\] The ratio will be a constant w.r.t. \(\theta\) only when \(\bar{x}=\bar{y}\).

  1. In this case, \(\min_i\mathbf{X}=min\{X_1,\cdots,X_n\}\) is the minimal sufficient statistic. Becasue \[\begin{equation} \begin{split} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}&= \frac{exp\{n(\bar{x}-\theta)\}\prod_{i=1}^nI_{(\theta,\infty)(x_i)}} {exp\{n(\bar{y}-\theta)\}\prod_{i=1}^nI_{(\theta,\infty)(x_i)}} &=exp\{-n(\bar{x}-\bar{y})\}\frac{I_{(\theta,\infty)}(\min_i(x_i))}{I_{(\theta,\infty)}(\min_i(y_i))} \end{split} \tag{8.8} \end{equation}\] which is a constant w.r.t. \(\theta\) only when \(\min_i(x_i)=\min_i(y_i)\).

  2. In this case, the order statistics are the minimal sufficient statistic. Because \[\begin{equation} \begin{split} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}&= \frac{\prod_{i=1}^ne^{-x_i}}{\prod_{i=1}^ne^{-y_i}} (\frac{\prod_{i=1}^n(1+e^{-y_i-\theta})}{\prod_{i=1}^n(1+e^{-x_i-\theta})})^2\\ &=\frac{\prod_{i=1}^ne^{-x_i}}{\prod_{i=1}^ne^{-y_i}} (\prod_{i=1}^n\frac{e^{-\theta}+e^{-y_i}}{e^{-\theta}+e^{-x_i}})^2 \end{split} \tag{8.9} \end{equation}\] which is a constant w.r.t. \(\theta\) only when order statistics for sample points \(\mathbf{x}\) and \(\mathbf{y}\) are equal.

  3. In this case, the order statistics are the minimal sufficient statistic. Because \[\begin{equation} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}=\frac{\prod_{i=1}^n(1+(y_i-\theta)^2)}{\prod_{i=1}^n(1+(x_i-\theta)^2)} \tag{8.10} \end{equation}\] Hence, suppose two sample points \(\mathbf{x}\) and \(\mathbf{y}\) only differ at \(x_j\) and \(y_j\), then from (8.10) we have \[\begin{equation} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}=\frac{1+(y_j-\theta)^2}{1+(x_j-\theta)^2} \tag{8.11} \end{equation}\] which is not a constant w.r.t. \(\theta\). Hence in order to have it as a constant, we have to have the trivial case, i.e. order statistics, as the minimal sufficient statistics.

  4. In this case, the order statistics are the minimal sufficient statistic. Because \[\begin{equation} \begin{split} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}&=exp(\sum_{i=1}^n|y_i-\theta|-\sum_{i=1}^n|x_i-\theta|)\\ &=exp(\sum_{j\in\{j:y_{(j)}<\theta\}}(\theta-y_{(j)})+\sum_{j\in\{j:y_{(j)}\geq\theta\}}(y_{(j)}-\theta)\\ &-\sum_{j\in\{j:x_{(j)}<\theta\}}(\theta-x_{(j)})-\sum_{j\in\{j:x_{(j)}\geq\theta\}}(x_{(j)}-\theta)) \end{split} \tag{8.12} \end{equation}\] From (8.12) we can see if there is at least one order statistic in sample points \(\mathbf{x}\) and \(\mathbf{y}\) that are differ, (8.12) will be depended on \(\theta\). Thus, order statistics are the minimal sufficient statistic.

Exercise 8.6 (Casella and Berger 6.12) A natural ancillary statistic in most problem is the sample size. For example, let N be a random variable taking values \(1,2,\cdots\) with known probabilities \(p_1,p_2,\cdots\), where \(\sum_{i=1}^{\infty}p_i=1\). Having observed \(N=n\), perform \(n\) Bernoulli trials with success probability \(\theta\), getting \(X\) successes.

  1. Prove that the pair \((X,N)\) is minimal sufficient and N is ancillary for \(\theta\).

  2. Prove that the estimator \(\frac{X}{N}\) is unbiased for \(\theta\) and has variance \(\theta(1-\theta)E(1/N)\).

Proof. (a) For minimal sufficiency, consider two pairs of statistics \((x,n_1)\) and \((y,n_2)\) corresponding to two sample points \(\mathbf{x}\) and \(\mathbf{y}\). Then \[\begin{equation} \begin{split} \frac{p(x,n_1|\theta)}{p(y,n_2|\theta)}&=\frac{{n_1 \choose x}\theta^x(1-\theta)^{n_1-x}p_{n_1}}{{n_2 \choose y}\theta^y(1-\theta)^{n_2-y}p_{n_2}}\\ &\propto\theta^{x-y}(1\theta)^{(n_1-n_2)-(x-y)} \end{split} \tag{8.13} \end{equation}\] This is a constant w.r.t. \(\theta\) only when \(x=y\) and \(n_1=n_2\). Thus, the minimal sufficiency is proved.

For ancillarity, consider the distribution of \(N\), we have \[\begin{equation} f_N(n)=\sum_{x=0}^n{n \choose x}\theta^x(1-\theta)^{n-x}p_{n}=p_n \tag{8.14} \end{equation}\] Since the pdf of \(N\) does not depend on \(\theta\), it is an ancillary statistic for \(\theta\).

  1. Directly calculate the expectation we have \[\begin{equation} \begin{split} E(\frac{X}{N})&=\sum_{n=1}^{\infty}\sum_{x=0}^{n}\frac{x}{n}{n \choose x}\theta^x(1-\theta)^{n-x}p_{n}\\ &=\sum_{n=1}^{\infty}\frac{p_n}{n}\cdot \frac{n}{\theta}\\ &=\theta\sum_{n=1}^{\infty} p_n=\theta \end{split} \tag{8.15} \end{equation}\] Hence, this estimator is unbiased. For variance, we have \[\begin{equation} \begin{split} E((\frac{X}{N})^2)&=\sum_{n=1}^{\infty}\sum_{x=0}^{n}(\frac{x}{n})^2{n \choose x}\theta^x(1-\theta)^{n-x}p_{n}\\ &=\sum_{n=1}^{\infty}\frac{p_n}{n^2}\cdot (n\theta(1-\theta)+(n\theta)^2)\\ &=\theta(1-\theta)\sum_{n=1}^{\infty}\frac{p_n}{n}+\theta^2=\theta(1-\theta)E(\frac{1}{N})+\theta^2 \end{split} \tag{8.16} \end{equation}\] Hence, \[\begin{equation} Var(\frac{X}{N})=\theta(1-\theta)E(\frac{1}{N})+\theta^2-\theta^2=\theta(1-\theta)E(\frac{1}{N}) \tag{8.17} \end{equation}\] as we desired.

Exercise 8.7 (Casella and Berger 6.15) Let \(X_1,\cdots,X_n\) be i.i.d. \(N(\theta,a\theta^2)\) where a is a known constant and \(\theta>0\).

  1. Show that the parameter space does not contain a two-dimensional open set.

  2. Show that the statistic \(T=(\bar{X},S^2)\) is a sufficient statistic for \(\theta\), but the family of distributions is not complete.

Proof. (a) Since when mean parameter \(\theta\) is fixed, the variance parameter are also fixed as \(a\theta^2\). Hence, the parameter space is a quadratic curve on the plane truncated at \(\theta=0\). It does not contain any two-dimensional open set.

  1. Consider the joint pdf of \(X_1,\cdots,X_n\), we have \[\begin{equation} \begin{split} f(x_1,\cdots,x_n)&=(2\pi)^{-n/2}(a\theta^2)^{-n/2}exp(-\frac{1}{2a\theta^2}\sum_{i=1}^n(x_i-\theta)^2)\\ &=(2\pi)^{-n/2}(a\theta^2)^{-n/2}exp(-\frac{1}{2a\theta^2}[(n-1)S^2+n(\bar{x}-\theta)^2]) \end{split} \tag{8.18} \end{equation}\] By Factorization theorem, \(T=(\bar{X},S^2)\) is a sufficient statistic for \(\theta\).

Since \(E(S^2)=a\theta^2\) and \(E(\bar{X}^2)=\frac{a\theta^2}{n}+\theta^2\). Hence, \(E(\frac{an}{n+a}\bar{X}^2-S^2)=0\). Hence, if we choose \(g(\mathbf{T})=\frac{an}{n+a}\bar{X}^2-S^2\), then \(Eg(\mathbf{T})=0\) but does necessarily have \(g(\mathbf{T})=0\) almost surely. Thus, it is not complete.

Method to prove the family of distribution is not complete: construct a function of statistics that has zero expectation.
Exercise 8.8 (Casella and Berger 6.17) Let \(X_1,\cdots,X_n\) be i.i.d. with geometric distribution \(P(X=x)=\theta(1-\theta)^{x-1}\) for \(x=1,2,\cdots\) and \(0<\theta<1\). Show that \(\sum_{i=1}^nX_i\) is sufficient for \(\theta\) and find the family of distributions of \(\sum_{i=1}^nX_i\). Is the family complete?

Proof. Consider the joint pdf of \(X_1,\cdots,X_n\), we have \[\begin{equation} f(x_1,\cdots,x_n)=(\frac{\theta}{1-\theta})^n(1-\theta)^{\sum_{i=1}^nx_i} \tag{8.19} \end{equation}\] By Factorization Theorem, take \(h(x)=1\) we have \(\sum_{i=1}^nx_i\) is a sufficient statistic.

Intuitively, the sum of independent geometricly distributed random variables has a negative binominal distribution. \[\begin{equation} P(\sum_{i=1}^nX_i=m)={m-1 \choose n-1}\theta^n(1-\theta)^{m-n} \tag{8.20} \end{equation}\]

For completeness, using definition, consider the expectation of \(g(\mathbf{T})\), we have \[\begin{equation} Eg(\mathbf{T})=\sum_{t=n}^{\infty}g(t){t-1 \choose n-1}(\frac{\theta}{1-\theta})^n(1-\theta)^t \tag{8.21} \end{equation}\] which equals \(0\) only when \(g(t)=0\) becasue \(0<\theta<1\) and \({t-1 \choose n-1}(\frac{\theta}{1-\theta})^n(1-\theta)^t>0,\forall t\in\mathbb{N}\) and \(t\geq n\). Hence, \(\sum_{i=1}^nx_i\) is a complete statistic.

Exercise 8.9 (Casella and Berger 6.22) Let \(X_1,\cdots,X_n\) be a random sample from a population with pdf \[\begin{equation} f(x|\theta)=\theta x^{\theta-1},0<x<1,\theta>0 \tag{8.22} \end{equation}\]

  1. Is \(\sum_{i=1}^n X_i\) sufficient for \(\theta\)?

  2. Find a complete sufficient statistic for \(\theta\).

Proof. (a) Consider the joint pdf of \(X_1,\cdots,X_n\), we have \[\begin{equation} f(x_1,\cdots,x_n)=\theta^n(\prod_{i=1}^nx_i)^{\theta-1} \tag{8.23} \end{equation}\] Thus, \(\prod_{i=1}^nx_i\) is a sufficient statistic, not \(\sum_{i=1}^n X_i\).

  1. Notice that \(f(x|\theta)=\theta e^{(\theta-1)\log(x)}\) belongs to exponential family and the support contains open set in \(\mathbb{R}\). Thus, by Theorem 6.2, \(\sum_{i=1}^n\log(X_i)=\log(\prod_{i=1}^nX_i)\) is a complete statistic. \(\log\) is an one-to-one transformation, hence \(\prod_{i=1}^nx_i\) is a complete statistic.

Exercise 8.10 (Casella and Berger 6.26) Use Minimal Sufficient Statistics Theorem (Theorem 8.1) to estabilish that given sample \(X_1,\cdots,X_n\), the following statistics are minimal sufficient.

  1. Distribution: \(N(\theta,1)\), Statistic: \(\bar{X}\).

  2. Distribution: \(Gamma(\alpha,\beta)\) with \(\alpha\) known, Statistic: \(\sum_{i=1}^nX_i\).

  3. Distribution: \(Unif(0,\theta)\), Statistic: \(\max_iX_i\).

  4. Distribution: \(Cauchy(\theta,1)\), Statistic: \(X_{(1)},\cdots,X_{(n)}\).

  5. Distribution: \(logistic(\mu,\beta)\), Statistic: \(X_{(1)},\cdots,X_{(n)}\).

Theorem 8.1 (Minimal Sufficient Statistics) Suppose that the family of densities \(\{f_0(\mathbf{x}),\cdots,f_k(\mathbf{x})\}\) all have common support. Then

  1. The statistic \(T(\mathbf{X})=(\frac{f_1(\mathbf{X})}{f_0(\mathbf{X})},\frac{f_2(\mathbf{X})}{f_0(\mathbf{X})},\cdots,\frac{f_k(\mathbf{X})}{f_0(\mathbf{X})})\) is minimal sufficient for the family \(\{f_0(\mathbf{x}),\cdots,f_k(\mathbf{x})\}\).

  2. If \(\mathcal{F}\) is a family of densities with common support, and

  1. \(f_i(\mathbf{x})\in\mathcal{F},i=0,1,\cdots,k\),

  2. \(T(\mathbf{x})\) is sufficient for \(\mathcal{F}\),

Then \(T(\mathbf{x})\) is minimal sufficient for \(\mathcal{F}\).
Proof. We have estabilished those statistics are sufficient for the corresponding family. Hence by 8.1 part b, we have the minimal sufficiency.

Exercise 8.11 (Casella and Berger 6.30) Let \(X_1,\cdots,X_n\) be a random sample from the pdf \(f(x|\mu)=e^{-(x-\mu)}\), where \(-\infty<\mu<x<\infty\).

  1. Show that \(X_{(1)}=\min_{i}X_i\) is a complete sufficient statistic.

  2. Use Basu Theorem to show that \(X_{(1)}\) and \(S^2\) are independent.

Proof. Consider the joint pdf of \(X_1,\cdots,X_n\), \[\begin{equation} \begin{split} f(x_1,\cdots,x_n)&=\prod_{i=1}^nexp(-(x_i-\mu))I_{x_i>\mu}(x_i)\\ &=exp(-(\sum_{i=1}^nx_i-n\mu))I_{\min_ix_i>\mu}(\mathbf{x})\\ &=exp(-\sum_{i=1}^nx_i)exp(n\mu)I_{\min_ix_i>\mu}(\mathbf{x}) \end{split} \tag{8.24} \end{equation}\] Thus, by Factorization Theorem, \(X_{(1)}=\min_{i}X_i\) is a sufficient statistic.

As for completeness, consider the expectation of some function \(g(\mathbf{T})\), we have \[\begin{equation} Eg(\mathbf{T})=\int_{\mu}^{\infty}ne^{-n(y-\mu)}dy \tag{8.25} \end{equation}\] If \(Eg(\mathbf{T})=0\) then \(\int_{\mu}^{\infty}e^{-ny}dy=0\) for all \(\mu\). Taking derivatives w.r.t. \(\mu\), we have \[\begin{equation} -g(\mu)e^{-n\mu}=0 \tag{8.26} \end{equation}\] for all \(\mu\). Hence \(g(\mu)=0\) for all \(\mu\). \(X_{(1)}\) is also complete as we desire.

  1. By Basu Theorem, we only need to show that \(S^2\) is an ancillary statistic. Becasue \(f(x|\mu)\) is a location family. We can write \(X_i=Z_i+\mu\), where \(Z_i\) are samples from \(f(x|0)\). Also \(S^2=\frac{1}{n-1}\sum_{i=1}^n(Z_i-\bar{Z})^2\), whose distribution does not depend on \(\mu\). Hence, \(S^2\) is ancillary.
Method to check completeness: Write as an exponential family and then use Theorem 6.2. Or by definition, find pdf of the statistic (not the joint pdf), consider the expectation, using the fact that the expectation as a function of \(\theta\) is a constant, thus the derivatives w.r.t. \(\theta\) is also 0.
Exercise 8.12 (Casella and Berger 6.32) Prove the Likelihood Principle Corollary.
Proof. For \(\mathbf{x}^*\in\mathcal{X}\), define \[\begin{equation} Y=\left\{\begin{aligned} & 1 & \mathbf{x}=\mathbf{x}^*\\& 0 & \mathbf{x}\neq\mathbf{x}^*\end{aligned}\right. \tag{8.27} \end{equation}\] Y has distribution given by \[\begin{equation} f(Y=1|\theta)=f(\mathbf{x}^*|\theta)=1-f(Y=0|\theta) \tag{8.28} \end{equation}\] For the experiment \(E^*\) of observing \(Y\), by Likelihood Principle, \[\begin{equation} Ev(E,\mathbf{x}^*)=Ev(E^*,1) \tag{8.29} \end{equation}\] Hence using (8.28), \(Ev(E^*,1)\) depends only on \(f(\mathbf{x}^*|\theta)=L(\theta|\mathbf{x}^*)\).
Exercise 8.13 (Casella and Berger 6.35) A risky experimental treatment is to be given to at most three patients. The treatment will be given to one patient. If it is a success, then it will be given to a second. If it is a success, it will be given to a third patient. Model the outcomes for the patients as independent \(Bernoulli(p)\) random variables. Identify the four sample points in this model and show that, according to the Formal Likelihood Principle, the inference about \(p\) should not depend on the fact that the sample size was determined by the data.
Proof. In this model,the four sample points are: patient one fail; patient one success while patient two fail; patient one and two success while patient three fail; All patient success. The likelihood functions corresponding to each outcome are \(1-p,p(1-p),p^2(1-p)\) and \(p^3\), respectively. Since they are all independent with the number of experiments, by the Formal Likelihood Principle, the inference about \(p\) should not depend on the fact that the sample size was determined by the data.

Exercise 8.14 (Casella and Berger 6.36) One advantage of using a minimal sufficient statistic is that unbiased estimators will have smaller variance, as the following exercise will show. Suppose \(T_1\) is sufficient and \(T_2\) is minimal sufficient, \(U\) is an unbiased estimator of \(\theta\), and define \(U_1=E(U|T_1)\) and \(U_2=E(U|T_2)\).

  1. Show that \(U_2=E(U_1|T_2)\).

  2. Now use the conditional variance formula to show that \(Var(U_2)\leq Var(U_1)\).

Proof. (a) Only consider discrete case here. Let us first proof the following lemma: if \(\mathbf{x}\) is a vector of ramdom variables with joint pmf \(f(\mathbf{x})\), then the joint distribution \(f(\mathbf{x},g(\mathbf{x}))\) is the same as \(f(\mathbf{x})\). This is straight forward for discrete random variables.

Noticing this lemma, assume samples \(\mathbf{X}=(X_1,\cdots,X_n)\) has joint pdf \(f(\mathbf{x}|\theta)\) and pdfs of statistic \(T_1\) and \(T_2\) are \(g_1(\cdot|\theta)\) and \(g_2(\cdot|\theta)\), respectively. Then the joint distribution of any statistic and \(\mathbf{X}\) is still \(f(\mathbf{x}|\theta)\) and the joint distribution of \(T_1\) and \(T_2\) is \(g_1(\cdot|\theta)\) since \(T_2\) is minimal sufficient. Suppose \(T_2=r(T_1)\). Therefore, firstly we have \[\begin{equation} E(U|T_1=t)=E(U(\mathbf{x})|T_1(\mathbf{x})=t)=\sum_{\mathbf{x}\in\mathcal{X}}U(\mathbf{x})f_1(\mathbf{x}|T_1=t) \tag{8.30} \end{equation}\] Here \(f_1(\mathbf{x}|T_1)\) denote the conditional pdf of \(\mathbf{x}\) given \(T_1\). Define \(A(t):=\{\mathbf{x}\in\mathcal{X}:T(\mathbf{x})=t\}\). Then (8.30) can be written as \[\begin{equation} E(U|T_1=t_1)=\sum_{\mathbf{x}\in A(t_1)}U(\mathbf{x})\frac{f(\mathbf{x}|\theta)}{g_1(t_1|\theta)} \tag{8.31} \end{equation}\] And similarily, we have \[\begin{equation} E(U|T_2=t_2)=\sum_{\mathbf{x}\in B(t_2)}U(\mathbf{x})\frac{f(\mathbf{x}|\theta)}{g_2(t_2|\theta)} \tag{8.32} \end{equation}\] Furthermore, consider \(E(U_1|T_2)\) we have \[\begin{equation} \begin{split} E(U_1|T_2=t_2)&=\sum_{t_1\in C(t_2)}\sum_{\mathbf{x}\in A(t_1)}U(\mathbf{x})\frac{f(\mathbf{x}|\theta)}{g_1(t_1|\theta)}\frac{g_1(t_1|\theta)}{g_2(t_2|\theta)} &=\sum_{t_1\in C(t_2)}\sum_{\mathbf{x}\in A(t_1)}U(\mathbf{x})\frac{f(\mathbf{x}|\theta)}{g_2(t_2|\theta)} \end{split} \tag{8.33} \end{equation}\] with \(C(t_2):\{t(\mathbf{x}):r(t(\mathbf{x}))=t_2\}\). Hence, we only left to show that \(\bigcup_{t1\in C(t_2)}A(t_1)=B(t_2)\) then we are done.

Consider sample point \(\mathbf{x}\in\mathcal{X}\), for any \(\mathbf{x}\in\bigcup_{t1\in C(t_2)}A(t_1)\), then \(T_1(\mathbf{x})=t_1\) and \(r(t_1)=t_2\), hence \(T_2(\mathbf{x})=r(T_1(\mathbf{x}))=t_2\). we have \(\mathbf{x}\in B(t_2)\). Therefore, \(\bigcup_{t1\in C(t_2)}A(t_1)\subseteq B(t_2)\). On the other hand, for any \(\mathbf{x}\in B(t_2)\), \(T_2(\mathbf{x})=t_2\) which implies \(r(T_1(\mathbf{x}))=t_2\). Denote \(T_1(\mathbf{x})=t_1^*\). Then we have \(t_1^*\in C(t_2)\) and \(\mathbf{x}\in A(t_1^*)\). Hence we have proved \(\mathbf{x}\in \bigcup_{t_1^*\in C(t_2)}A(t_1^*)=\bigcup_{t1\in C(t_2)}A(t_1)\). Hence \(B(t_2)\subseteq\bigcup_{t1\in C(t_2)}A(t_1)\) and finally \(\bigcup_{t1\in C(t_2)}A(t_1)= B(t_2)\).

Thus, (8.33) and (8.32) gives same result and we have what we want.

  1. Since \(U_2=E(U_1|T_2)\), using law of total variance we have \[\begin{equation} Var(U_1)=Var(E(U_1|T_2))+E(Var(U_1|T_2))\geq Var(E(U_1|T_2))=Var(U_2) \tag{8.34} \end{equation}\]
Sketch of proof for continuous case: using the so called “Standard Machine”. First any continuous pdf have a simple stepwise function approximation. For that stepwise function one can use the same method for discrete random variables.

Exercise 8.15 (Casella and Berger 6.37) Joshi and Nabar examine properties of linear estimators for the parameter in the so-called “Problem of the Nile”, where \((X,Y)\) has the joint density \[\begin{equation} f(x,y|\theta)=exp\{-(\theta x+y/\theta)\},\quad x>0,y>0 \tag{8.35} \end{equation}\]

  1. For an i.i.d. sample of size n, show that the Fisher information is \(I(\theta)=2n/\theta^2\).

  2. For the estimators \(T=\sqrt{\sum Y_i/\sum X_i}\) and \(U=\sqrt{\sum X_i\sum Y_i}\), show that

    1. the information in \(T\) alnoe is \([2n/(2n+1)]I(\theta)\).

    2. the information in \((T,U)\) is \(I(\theta)\).

    3. \((T,U)\) is jointly sufficient but not complete.

Proof. (a) For one observation \((X,Y)\) we have \[\begin{equation} I(\theta)=-E(\frac{\partial^2}{\partial\theta^2}\log f(X,Y|\theta))=\theta^{-3}E(2Y) \tag{8.36} \end{equation}\] Since \(Y\sim Exp(\theta)\), \(E(Y)=\theta\). Hence, for one data point \(I(\theta)=2/\theta^2\). Then for \(n\) data, \(I(\theta)=2n/\theta^2\).

  1. The cdf of T is \[\begin{equation} \begin{split} P(T\leq t)&=P(\frac{\sum Y_i}{\sum X_i}\leq t^2)=P(\frac{2\sum Y_i/\theta}{2\sum X_i\theta}\leq t^2/\theta^2)\\ &=P(F_{2n,2n}\leq t^2/\theta^2) \end{split} \tag{8.37} \end{equation}\] The last equation holds becasue \(2Y_i/\theta\) and \(2X_i\theta\) are all independent \(Exp(1)\) which is also \(\chi_2^2\). Thus, the density of \(T\) is \[\begin{equation} f_T(t)=\frac{\Gamma(2n)}{\Gamma{n}^2}\frac{2}{t}(\frac{t^2}{t^2+\theta^2})^n(\frac{\theta^2}{t^2+\theta^2})^n \tag{8.38} \end{equation}\] and the second derivatives w.r.t. \(\theta\) of the log density is \(\frac{2n}{\theta^2}(1-\frac{2}{(t^2/\theta^2+1)^2})\) and hence the information in \(T\) is \(\frac{2n}{\theta^2}[1-2 E(\frac{1}{F_{2n,2n}^2+1})^2]\). The expected value is \[\begin{equation} E(\frac{1}{F_{2n,2n}^2+1})^2=\frac{\Gamma(2n)}{\Gamma{n}^2}\int^{\infty}_0\frac{1}{(1+\omega)^2} \frac{\omega^{n-1}}{(1+\omega)^{2n}}d\omega=\frac{n+1}{2(2n+1)} \tag{8.39} \end{equation}\] Thus, the information of \(T\) is \([2n/(2n+1)]I(\theta)\).

For part (ii), let \(W=\sum X_i\) and \(V=\sum Y_i\). \(W\) and \(V\) are independent, and \(W\sim Gamma(n,\frac{1}{\theta})\), \(V\sim Gamma(n,\theta)\). Use this we can find joint distribution of \((T,U)\) as \[\begin{equation} f(t,u|\theta)=\frac{2}{\Gamma(n)^2t}u^{2n-1}exp(-\frac{u\theta}{t}-\frac{ut}{\theta}),\quad u>0,t>0 \tag{8.40} \end{equation}\] Thus, the ionformation is then \[\begin{equation} -E(-\frac{2UT}{\theta^3})=E(\frac{2V}{\theta^3})=\frac{2n\theta}{\theta^3}=I(\theta) \tag{8.40} \end{equation}\] as we desired.

For part (iii), the joint pdf of samples is \[\begin{equation} f(\mathbf{x},\mathbf{y})=exp(-\theta(\sum x_i)-(\sum y_i)/\theta) \tag{8.41} \end{equation}\] Hence \((W,V)\) is sufficient, so is \((T,U)\) since they are a one-to-one function of \((W,V)\).

However, since \(EU^2=E(WV)=(n/\theta)(n\theta)=n^2\), we can define function \(g(t,u)=u^2-n^2\) then \(E(g(\mathbf{T}))=0\) for all \(\theta\) does not imply \(g(\mathbf{T})=0\) almost surely. Thus, \((T,U)\) is not complete.