Chapter 8 Homework 2: Principles of Data Reduction: Problems and Solutions

Exercise 8.1 (Casella and Berger 6.1) Let X be one observation from a N(0,σ2) population. Is |X| a sufficient statistic?
Proof. Consider the pdf given by f(x|σ2)=(2πσ2)1/2exp(x22σ2)=(2πσ2)1/2exp(|x|22σ2) Define g(t|σ2)=(2πσ2)1/2exp(|x|22σ2) and h(x)=1, then by Factorization Theorem, |X| is a sufficient statistic.

Exercise 8.2 (Casella and Berger 6.2) Let X1,,Xn be independent random variables with densities fXi(x|θ)={eiθxxiθ0x<iθ

Prove that T=min is a sufficient statistic for \theta.
Proof. Consider the joint pdf \begin{equation} \begin{split} f(\mathbf{x}|\theta)&=\prod_{i=1}^n e^{i\theta-x} I_{x_i\geq i\theta}(x_i)\\ &=exp(\theta\sum_{i=1}^ni-\sum_{i=1}^nx_i)I_{\min_i(x_i/i)\geq\theta}(\mathbf{x})\\ &=exp(\theta\sum_{i=1}^ni)I_{\min_i(x_i/i)\geq\theta}(\mathbf{x})\cdot exp(-\sum_{i=1}^nx_i)\\ &=g(T(\mathbf{x})|\theta)h(\mathbf{x}) \end{split} \tag{8.3} \end{equation} Hence, by Factorization Theorem, we have T=\min_i(X_i/i) is a sufficient statistic for \theta.
Exercise 8.3 (Casella and Berger 6.5) Let X_1,\cdots,X_n be independent random variables with pdfs \begin{equation} f(x_i|\theta)=\left\{\begin{aligned} & \frac{1}{2i\theta} & \quad -i(\theta-1)< x_i<i(\theta+1) \\ & 0 & \quad otherwise \end{aligned} \right. \tag{8.4} \end{equation} where \theta>0. Find a two-dimensional sufficient statistic for \theta.
Proof. Consider the joint pdf \begin{equation} \begin{split} f(\mathbf{x}|\theta)&=\prod_{i=1}^n \frac{1}{2i\theta} I_{-i(\theta-1)< x_i<i(\theta+1)}(x_i)\\ &=(2\theta)^{-n}I_{\min_i\frac{x_i}{i}>1-\theta}(\mathbf{x})I_{\max_i\frac{x_i}{i}<1+\theta}(\mathbf{x}) \prod_{i=1}^ni^{-1}\\ &=g(T(\mathbf{x})|\theta)h(\mathbf{x}) \end{split} \tag{8.5} \end{equation} Hence T(\mathbf{X})=(\min_{i}(\frac{X_i}{i}),\max_{i}(\frac{X_i}{i})) is a two-dimensional sufficient statistic for \theta.
Exercise 8.4 (Casella and Berger 6.6) Let X_1,\cdots,X_n be a random sample from a Gamma(\alpha,\beta) population. Find a two dimensional sufficient statistic for (\alpha,\beta)
Proof. Consider the joint pdf, we have \begin{equation} \begin{split} f(\mathbf{x}|\theta)&=\prod_{i=1}^n \frac{\beta^\alpha}{\Gamma(\alpha)} x_i^{\alpha-1}e^{-\beta x_i}\\ &=(\frac{\beta^\alpha}{\Gamma(\alpha)})^n (\prod_{i=1}^n x_i)^{\alpha-1}exp(-\beta(\sum_{i=1}^n x_i))\\ &=g(T(\mathbf{x})|\alpha,\beta)h(\mathbf{x}) \end{split} \tag{8.6} \end{equation} Hence T(\mathbf{X})=(\prod_{i=1}^n x_i,\sum_{i=1}^n x_i) is a two-dimensional sufficient statistic for \theta.

Exercise 8.5 (Casella and Berger 6.9) For each of the following distributions let X_1,\cdots,X_n be a random sample. Find a minimal sufficient statistic for \theta.

  1. Normal: f(x|\theta)=\frac{1}{\sqrt{2\pi}}e^{-(x-\theta)^2/2}, -\infty<x<\infty, -\infty<\theta<\infty.

  2. Location exponential: f(x|\theta)=e^{-(x-\theta)}, \theta<x<\infty, -\infty<\theta<\infty.

  3. Logistic: f(x|\theta)=\frac{e^{-(x-\theta)}}{(1+e^{-(x-\theta)})^2}, -\infty<x<\infty, -\infty<\theta<\infty.

  4. Cauchy: f(x|\theta)=\frac{1}{\pi[1+(x-\theta)^2]}, -\infty<x<\infty, -\infty<\theta<\infty.

  5. Double exponential: f(x|\theta)=\frac{1}{2}e^{-|x-\theta|}, -\infty<x<\infty, -\infty<\theta<\infty.

Proof. (a) In this case, sample mean is the minimal sufficient statistic for \theta. Because \begin{equation} \begin{split} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}&= \frac{exp\{-\frac{1}{2}[\sum_{i=1}^n(x_i-\bar{x})^2+n(\bar{x}-\theta)^2]\}} {exp\{-\frac{1}{2}[\sum_{i=1}^n(y_i-\bar{y})^2+n(\bar{y}-\theta)^2]\}}\\ &=exp\{-\frac{1}{2}[(n-1)(S_x^2+S_y^2)+n(\bar{x}^2-\bar{y}^2)+2n\theta(\bar{x}-\bar{y})]\} \end{split} \tag{8.7} \end{equation} The ratio will be a constant w.r.t. \theta only when \bar{x}=\bar{y}.

  1. In this case, \min_i\mathbf{X}=min\{X_1,\cdots,X_n\} is the minimal sufficient statistic. Becasue \begin{equation} \begin{split} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}&= \frac{exp\{n(\bar{x}-\theta)\}\prod_{i=1}^nI_{(\theta,\infty)(x_i)}} {exp\{n(\bar{y}-\theta)\}\prod_{i=1}^nI_{(\theta,\infty)(x_i)}} &=exp\{-n(\bar{x}-\bar{y})\}\frac{I_{(\theta,\infty)}(\min_i(x_i))}{I_{(\theta,\infty)}(\min_i(y_i))} \end{split} \tag{8.8} \end{equation} which is a constant w.r.t. \theta only when \min_i(x_i)=\min_i(y_i).

  2. In this case, the order statistics are the minimal sufficient statistic. Because \begin{equation} \begin{split} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}&= \frac{\prod_{i=1}^ne^{-x_i}}{\prod_{i=1}^ne^{-y_i}} (\frac{\prod_{i=1}^n(1+e^{-y_i-\theta})}{\prod_{i=1}^n(1+e^{-x_i-\theta})})^2\\ &=\frac{\prod_{i=1}^ne^{-x_i}}{\prod_{i=1}^ne^{-y_i}} (\prod_{i=1}^n\frac{e^{-\theta}+e^{-y_i}}{e^{-\theta}+e^{-x_i}})^2 \end{split} \tag{8.9} \end{equation} which is a constant w.r.t. \theta only when order statistics for sample points \mathbf{x} and \mathbf{y} are equal.

  3. In this case, the order statistics are the minimal sufficient statistic. Because \begin{equation} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}=\frac{\prod_{i=1}^n(1+(y_i-\theta)^2)}{\prod_{i=1}^n(1+(x_i-\theta)^2)} \tag{8.10} \end{equation} Hence, suppose two sample points \mathbf{x} and \mathbf{y} only differ at x_j and y_j, then from (8.10) we have \begin{equation} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}=\frac{1+(y_j-\theta)^2}{1+(x_j-\theta)^2} \tag{8.11} \end{equation} which is not a constant w.r.t. \theta. Hence in order to have it as a constant, we have to have the trivial case, i.e. order statistics, as the minimal sufficient statistics.

  4. In this case, the order statistics are the minimal sufficient statistic. Because \begin{equation} \begin{split} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}&=exp(\sum_{i=1}^n|y_i-\theta|-\sum_{i=1}^n|x_i-\theta|)\\ &=exp(\sum_{j\in\{j:y_{(j)}<\theta\}}(\theta-y_{(j)})+\sum_{j\in\{j:y_{(j)}\geq\theta\}}(y_{(j)}-\theta)\\ &-\sum_{j\in\{j:x_{(j)}<\theta\}}(\theta-x_{(j)})-\sum_{j\in\{j:x_{(j)}\geq\theta\}}(x_{(j)}-\theta)) \end{split} \tag{8.12} \end{equation} From (8.12) we can see if there is at least one order statistic in sample points \mathbf{x} and \mathbf{y} that are differ, (8.12) will be depended on \theta. Thus, order statistics are the minimal sufficient statistic.

Exercise 8.6 (Casella and Berger 6.12) A natural ancillary statistic in most problem is the sample size. For example, let N be a random variable taking values 1,2,\cdots with known probabilities p_1,p_2,\cdots, where \sum_{i=1}^{\infty}p_i=1. Having observed N=n, perform n Bernoulli trials with success probability \theta, getting X successes.

  1. Prove that the pair (X,N) is minimal sufficient and N is ancillary for \theta.

  2. Prove that the estimator \frac{X}{N} is unbiased for \theta and has variance \theta(1-\theta)E(1/N).

Proof. (a) For minimal sufficiency, consider two pairs of statistics (x,n_1) and (y,n_2) corresponding to two sample points \mathbf{x} and \mathbf{y}. Then \begin{equation} \begin{split} \frac{p(x,n_1|\theta)}{p(y,n_2|\theta)}&=\frac{{n_1 \choose x}\theta^x(1-\theta)^{n_1-x}p_{n_1}}{{n_2 \choose y}\theta^y(1-\theta)^{n_2-y}p_{n_2}}\\ &\propto\theta^{x-y}(1\theta)^{(n_1-n_2)-(x-y)} \end{split} \tag{8.13} \end{equation} This is a constant w.r.t. \theta only when x=y and n_1=n_2. Thus, the minimal sufficiency is proved.

For ancillarity, consider the distribution of N, we have \begin{equation} f_N(n)=\sum_{x=0}^n{n \choose x}\theta^x(1-\theta)^{n-x}p_{n}=p_n \tag{8.14} \end{equation} Since the pdf of N does not depend on \theta, it is an ancillary statistic for \theta.

  1. Directly calculate the expectation we have \begin{equation} \begin{split} E(\frac{X}{N})&=\sum_{n=1}^{\infty}\sum_{x=0}^{n}\frac{x}{n}{n \choose x}\theta^x(1-\theta)^{n-x}p_{n}\\ &=\sum_{n=1}^{\infty}\frac{p_n}{n}\cdot \frac{n}{\theta}\\ &=\theta\sum_{n=1}^{\infty} p_n=\theta \end{split} \tag{8.15} \end{equation} Hence, this estimator is unbiased. For variance, we have \begin{equation} \begin{split} E((\frac{X}{N})^2)&=\sum_{n=1}^{\infty}\sum_{x=0}^{n}(\frac{x}{n})^2{n \choose x}\theta^x(1-\theta)^{n-x}p_{n}\\ &=\sum_{n=1}^{\infty}\frac{p_n}{n^2}\cdot (n\theta(1-\theta)+(n\theta)^2)\\ &=\theta(1-\theta)\sum_{n=1}^{\infty}\frac{p_n}{n}+\theta^2=\theta(1-\theta)E(\frac{1}{N})+\theta^2 \end{split} \tag{8.16} \end{equation} Hence, \begin{equation} Var(\frac{X}{N})=\theta(1-\theta)E(\frac{1}{N})+\theta^2-\theta^2=\theta(1-\theta)E(\frac{1}{N}) \tag{8.17} \end{equation} as we desired.

Exercise 8.7 (Casella and Berger 6.15) Let X_1,\cdots,X_n be i.i.d. N(\theta,a\theta^2) where a is a known constant and \theta>0.

  1. Show that the parameter space does not contain a two-dimensional open set.

  2. Show that the statistic T=(\bar{X},S^2) is a sufficient statistic for \theta, but the family of distributions is not complete.

Proof. (a) Since when mean parameter \theta is fixed, the variance parameter are also fixed as a\theta^2. Hence, the parameter space is a quadratic curve on the plane truncated at \theta=0. It does not contain any two-dimensional open set.

  1. Consider the joint pdf of X_1,\cdots,X_n, we have \begin{equation} \begin{split} f(x_1,\cdots,x_n)&=(2\pi)^{-n/2}(a\theta^2)^{-n/2}exp(-\frac{1}{2a\theta^2}\sum_{i=1}^n(x_i-\theta)^2)\\ &=(2\pi)^{-n/2}(a\theta^2)^{-n/2}exp(-\frac{1}{2a\theta^2}[(n-1)S^2+n(\bar{x}-\theta)^2]) \end{split} \tag{8.18} \end{equation} By Factorization theorem, T=(\bar{X},S^2) is a sufficient statistic for \theta.

Since E(S^2)=a\theta^2 and E(\bar{X}^2)=\frac{a\theta^2}{n}+\theta^2. Hence, E(\frac{an}{n+a}\bar{X}^2-S^2)=0. Hence, if we choose g(\mathbf{T})=\frac{an}{n+a}\bar{X}^2-S^2, then Eg(\mathbf{T})=0 but does necessarily have g(\mathbf{T})=0 almost surely. Thus, it is not complete.

Method to prove the family of distribution is not complete: construct a function of statistics that has zero expectation.
Exercise 8.8 (Casella and Berger 6.17) Let X_1,\cdots,X_n be i.i.d. with geometric distribution P(X=x)=\theta(1-\theta)^{x-1} for x=1,2,\cdots and 0<\theta<1. Show that \sum_{i=1}^nX_i is sufficient for \theta and find the family of distributions of \sum_{i=1}^nX_i. Is the family complete?

Proof. Consider the joint pdf of X_1,\cdots,X_n, we have \begin{equation} f(x_1,\cdots,x_n)=(\frac{\theta}{1-\theta})^n(1-\theta)^{\sum_{i=1}^nx_i} \tag{8.19} \end{equation} By Factorization Theorem, take h(x)=1 we have \sum_{i=1}^nx_i is a sufficient statistic.

Intuitively, the sum of independent geometricly distributed random variables has a negative binominal distribution. \begin{equation} P(\sum_{i=1}^nX_i=m)={m-1 \choose n-1}\theta^n(1-\theta)^{m-n} \tag{8.20} \end{equation}

For completeness, using definition, consider the expectation of g(\mathbf{T}), we have \begin{equation} Eg(\mathbf{T})=\sum_{t=n}^{\infty}g(t){t-1 \choose n-1}(\frac{\theta}{1-\theta})^n(1-\theta)^t \tag{8.21} \end{equation} which equals 0 only when g(t)=0 becasue 0<\theta<1 and {t-1 \choose n-1}(\frac{\theta}{1-\theta})^n(1-\theta)^t>0,\forall t\in\mathbb{N} and t\geq n. Hence, \sum_{i=1}^nx_i is a complete statistic.

Exercise 8.9 (Casella and Berger 6.22) Let X_1,\cdots,X_n be a random sample from a population with pdf \begin{equation} f(x|\theta)=\theta x^{\theta-1},0<x<1,\theta>0 \tag{8.22} \end{equation}

  1. Is \sum_{i=1}^n X_i sufficient for \theta?

  2. Find a complete sufficient statistic for \theta.

Proof. (a) Consider the joint pdf of X_1,\cdots,X_n, we have \begin{equation} f(x_1,\cdots,x_n)=\theta^n(\prod_{i=1}^nx_i)^{\theta-1} \tag{8.23} \end{equation} Thus, \prod_{i=1}^nx_i is a sufficient statistic, not \sum_{i=1}^n X_i.

  1. Notice that f(x|\theta)=\theta e^{(\theta-1)\log(x)} belongs to exponential family and the support contains open set in \mathbb{R}. Thus, by Theorem 6.2, \sum_{i=1}^n\log(X_i)=\log(\prod_{i=1}^nX_i) is a complete statistic. \log is an one-to-one transformation, hence \prod_{i=1}^nx_i is a complete statistic.

Exercise 8.10 (Casella and Berger 6.26) Use Minimal Sufficient Statistics Theorem (Theorem 8.1) to estabilish that given sample X_1,\cdots,X_n, the following statistics are minimal sufficient.

  1. Distribution: N(\theta,1), Statistic: \bar{X}.

  2. Distribution: Gamma(\alpha,\beta) with \alpha known, Statistic: \sum_{i=1}^nX_i.

  3. Distribution: Unif(0,\theta), Statistic: \max_iX_i.

  4. Distribution: Cauchy(\theta,1), Statistic: X_{(1)},\cdots,X_{(n)}.

  5. Distribution: logistic(\mu,\beta), Statistic: X_{(1)},\cdots,X_{(n)}.

Theorem 8.1 (Minimal Sufficient Statistics) Suppose that the family of densities \{f_0(\mathbf{x}),\cdots,f_k(\mathbf{x})\} all have common support. Then

  1. The statistic T(\mathbf{X})=(\frac{f_1(\mathbf{X})}{f_0(\mathbf{X})},\frac{f_2(\mathbf{X})}{f_0(\mathbf{X})},\cdots,\frac{f_k(\mathbf{X})}{f_0(\mathbf{X})}) is minimal sufficient for the family \{f_0(\mathbf{x}),\cdots,f_k(\mathbf{x})\}.

  2. If \mathcal{F} is a family of densities with common support, and

  1. f_i(\mathbf{x})\in\mathcal{F},i=0,1,\cdots,k,

  2. T(\mathbf{x}) is sufficient for \mathcal{F},

Then T(\mathbf{x}) is minimal sufficient for \mathcal{F}.
Proof. We have estabilished those statistics are sufficient for the corresponding family. Hence by 8.1 part b, we have the minimal sufficiency.

Exercise 8.11 (Casella and Berger 6.30) Let X_1,\cdots,X_n be a random sample from the pdf f(x|\mu)=e^{-(x-\mu)}, where -\infty<\mu<x<\infty.

  1. Show that X_{(1)}=\min_{i}X_i is a complete sufficient statistic.

  2. Use Basu Theorem to show that X_{(1)} and S^2 are independent.

Proof. Consider the joint pdf of X_1,\cdots,X_n, \begin{equation} \begin{split} f(x_1,\cdots,x_n)&=\prod_{i=1}^nexp(-(x_i-\mu))I_{x_i>\mu}(x_i)\\ &=exp(-(\sum_{i=1}^nx_i-n\mu))I_{\min_ix_i>\mu}(\mathbf{x})\\ &=exp(-\sum_{i=1}^nx_i)exp(n\mu)I_{\min_ix_i>\mu}(\mathbf{x}) \end{split} \tag{8.24} \end{equation} Thus, by Factorization Theorem, X_{(1)}=\min_{i}X_i is a sufficient statistic.

As for completeness, consider the expectation of some function g(\mathbf{T}), we have \begin{equation} Eg(\mathbf{T})=\int_{\mu}^{\infty}ne^{-n(y-\mu)}dy \tag{8.25} \end{equation} If Eg(\mathbf{T})=0 then \int_{\mu}^{\infty}e^{-ny}dy=0 for all \mu. Taking derivatives w.r.t. \mu, we have \begin{equation} -g(\mu)e^{-n\mu}=0 \tag{8.26} \end{equation} for all \mu. Hence g(\mu)=0 for all \mu. X_{(1)} is also complete as we desire.

  1. By Basu Theorem, we only need to show that S^2 is an ancillary statistic. Becasue f(x|\mu) is a location family. We can write X_i=Z_i+\mu, where Z_i are samples from f(x|0). Also S^2=\frac{1}{n-1}\sum_{i=1}^n(Z_i-\bar{Z})^2, whose distribution does not depend on \mu. Hence, S^2 is ancillary.
Method to check completeness: Write as an exponential family and then use Theorem 6.2. Or by definition, find pdf of the statistic (not the joint pdf), consider the expectation, using the fact that the expectation as a function of \theta is a constant, thus the derivatives w.r.t. \theta is also 0.
Exercise 8.12 (Casella and Berger 6.32) Prove the Likelihood Principle Corollary.
Proof. For \mathbf{x}^*\in\mathcal{X}, define \begin{equation} Y=\left\{\begin{aligned} & 1 & \mathbf{x}=\mathbf{x}^*\\& 0 & \mathbf{x}\neq\mathbf{x}^*\end{aligned}\right. \tag{8.27} \end{equation} Y has distribution given by \begin{equation} f(Y=1|\theta)=f(\mathbf{x}^*|\theta)=1-f(Y=0|\theta) \tag{8.28} \end{equation} For the experiment E^* of observing Y, by Likelihood Principle, \begin{equation} Ev(E,\mathbf{x}^*)=Ev(E^*,1) \tag{8.29} \end{equation} Hence using (8.28), Ev(E^*,1) depends only on f(\mathbf{x}^*|\theta)=L(\theta|\mathbf{x}^*).
Exercise 8.13 (Casella and Berger 6.35) A risky experimental treatment is to be given to at most three patients. The treatment will be given to one patient. If it is a success, then it will be given to a second. If it is a success, it will be given to a third patient. Model the outcomes for the patients as independent Bernoulli(p) random variables. Identify the four sample points in this model and show that, according to the Formal Likelihood Principle, the inference about p should not depend on the fact that the sample size was determined by the data.
Proof. In this model,the four sample points are: patient one fail; patient one success while patient two fail; patient one and two success while patient three fail; All patient success. The likelihood functions corresponding to each outcome are 1-p,p(1-p),p^2(1-p) and p^3, respectively. Since they are all independent with the number of experiments, by the Formal Likelihood Principle, the inference about p should not depend on the fact that the sample size was determined by the data.

Exercise 8.14 (Casella and Berger 6.36) One advantage of using a minimal sufficient statistic is that unbiased estimators will have smaller variance, as the following exercise will show. Suppose T_1 is sufficient and T_2 is minimal sufficient, U is an unbiased estimator of \theta, and define U_1=E(U|T_1) and U_2=E(U|T_2).

  1. Show that U_2=E(U_1|T_2).

  2. Now use the conditional variance formula to show that Var(U_2)\leq Var(U_1).

Proof. (a) Only consider discrete case here. Let us first proof the following lemma: if \mathbf{x} is a vector of ramdom variables with joint pmf f(\mathbf{x}), then the joint distribution f(\mathbf{x},g(\mathbf{x})) is the same as f(\mathbf{x}). This is straight forward for discrete random variables.

Noticing this lemma, assume samples \mathbf{X}=(X_1,\cdots,X_n) has joint pdf f(\mathbf{x}|\theta) and pdfs of statistic T_1 and T_2 are g_1(\cdot|\theta) and g_2(\cdot|\theta), respectively. Then the joint distribution of any statistic and \mathbf{X} is still f(\mathbf{x}|\theta) and the joint distribution of T_1 and T_2 is g_1(\cdot|\theta) since T_2 is minimal sufficient. Suppose T_2=r(T_1). Therefore, firstly we have \begin{equation} E(U|T_1=t)=E(U(\mathbf{x})|T_1(\mathbf{x})=t)=\sum_{\mathbf{x}\in\mathcal{X}}U(\mathbf{x})f_1(\mathbf{x}|T_1=t) \tag{8.30} \end{equation} Here f_1(\mathbf{x}|T_1) denote the conditional pdf of \mathbf{x} given T_1. Define A(t):=\{\mathbf{x}\in\mathcal{X}:T(\mathbf{x})=t\}. Then (8.30) can be written as \begin{equation} E(U|T_1=t_1)=\sum_{\mathbf{x}\in A(t_1)}U(\mathbf{x})\frac{f(\mathbf{x}|\theta)}{g_1(t_1|\theta)} \tag{8.31} \end{equation} And similarily, we have \begin{equation} E(U|T_2=t_2)=\sum_{\mathbf{x}\in B(t_2)}U(\mathbf{x})\frac{f(\mathbf{x}|\theta)}{g_2(t_2|\theta)} \tag{8.32} \end{equation} Furthermore, consider E(U_1|T_2) we have \begin{equation} \begin{split} E(U_1|T_2=t_2)&=\sum_{t_1\in C(t_2)}\sum_{\mathbf{x}\in A(t_1)}U(\mathbf{x})\frac{f(\mathbf{x}|\theta)}{g_1(t_1|\theta)}\frac{g_1(t_1|\theta)}{g_2(t_2|\theta)} &=\sum_{t_1\in C(t_2)}\sum_{\mathbf{x}\in A(t_1)}U(\mathbf{x})\frac{f(\mathbf{x}|\theta)}{g_2(t_2|\theta)} \end{split} \tag{8.33} \end{equation} with C(t_2):\{t(\mathbf{x}):r(t(\mathbf{x}))=t_2\}. Hence, we only left to show that \bigcup_{t1\in C(t_2)}A(t_1)=B(t_2) then we are done.

Consider sample point \mathbf{x}\in\mathcal{X}, for any \mathbf{x}\in\bigcup_{t1\in C(t_2)}A(t_1), then T_1(\mathbf{x})=t_1 and r(t_1)=t_2, hence T_2(\mathbf{x})=r(T_1(\mathbf{x}))=t_2. we have \mathbf{x}\in B(t_2). Therefore, \bigcup_{t1\in C(t_2)}A(t_1)\subseteq B(t_2). On the other hand, for any \mathbf{x}\in B(t_2), T_2(\mathbf{x})=t_2 which implies r(T_1(\mathbf{x}))=t_2. Denote T_1(\mathbf{x})=t_1^*. Then we have t_1^*\in C(t_2) and \mathbf{x}\in A(t_1^*). Hence we have proved \mathbf{x}\in \bigcup_{t_1^*\in C(t_2)}A(t_1^*)=\bigcup_{t1\in C(t_2)}A(t_1). Hence B(t_2)\subseteq\bigcup_{t1\in C(t_2)}A(t_1) and finally \bigcup_{t1\in C(t_2)}A(t_1)= B(t_2).

Thus, (8.33) and (8.32) gives same result and we have what we want.

  1. Since U_2=E(U_1|T_2), using law of total variance we have \begin{equation} Var(U_1)=Var(E(U_1|T_2))+E(Var(U_1|T_2))\geq Var(E(U_1|T_2))=Var(U_2) \tag{8.34} \end{equation}
Sketch of proof for continuous case: using the so called “Standard Machine”. First any continuous pdf have a simple stepwise function approximation. For that stepwise function one can use the same method for discrete random variables.

Exercise 8.15 (Casella and Berger 6.37) Joshi and Nabar examine properties of linear estimators for the parameter in the so-called “Problem of the Nile”, where (X,Y) has the joint density \begin{equation} f(x,y|\theta)=exp\{-(\theta x+y/\theta)\},\quad x>0,y>0 \tag{8.35} \end{equation}

  1. For an i.i.d. sample of size n, show that the Fisher information is I(\theta)=2n/\theta^2.

  2. For the estimators T=\sqrt{\sum Y_i/\sum X_i} and U=\sqrt{\sum X_i\sum Y_i}, show that

    1. the information in T alnoe is [2n/(2n+1)]I(\theta).

    2. the information in (T,U) is I(\theta).

    3. (T,U) is jointly sufficient but not complete.

Proof. (a) For one observation (X,Y) we have \begin{equation} I(\theta)=-E(\frac{\partial^2}{\partial\theta^2}\log f(X,Y|\theta))=\theta^{-3}E(2Y) \tag{8.36} \end{equation} Since Y\sim Exp(\theta), E(Y)=\theta. Hence, for one data point I(\theta)=2/\theta^2. Then for n data, I(\theta)=2n/\theta^2.

  1. The cdf of T is \begin{equation} \begin{split} P(T\leq t)&=P(\frac{\sum Y_i}{\sum X_i}\leq t^2)=P(\frac{2\sum Y_i/\theta}{2\sum X_i\theta}\leq t^2/\theta^2)\\ &=P(F_{2n,2n}\leq t^2/\theta^2) \end{split} \tag{8.37} \end{equation} The last equation holds becasue 2Y_i/\theta and 2X_i\theta are all independent Exp(1) which is also \chi_2^2. Thus, the density of T is \begin{equation} f_T(t)=\frac{\Gamma(2n)}{\Gamma{n}^2}\frac{2}{t}(\frac{t^2}{t^2+\theta^2})^n(\frac{\theta^2}{t^2+\theta^2})^n \tag{8.38} \end{equation} and the second derivatives w.r.t. \theta of the log density is \frac{2n}{\theta^2}(1-\frac{2}{(t^2/\theta^2+1)^2}) and hence the information in T is \frac{2n}{\theta^2}[1-2 E(\frac{1}{F_{2n,2n}^2+1})^2]. The expected value is \begin{equation} E(\frac{1}{F_{2n,2n}^2+1})^2=\frac{\Gamma(2n)}{\Gamma{n}^2}\int^{\infty}_0\frac{1}{(1+\omega)^2} \frac{\omega^{n-1}}{(1+\omega)^{2n}}d\omega=\frac{n+1}{2(2n+1)} \tag{8.39} \end{equation} Thus, the information of T is [2n/(2n+1)]I(\theta).

For part (ii), let W=\sum X_i and V=\sum Y_i. W and V are independent, and W\sim Gamma(n,\frac{1}{\theta}), V\sim Gamma(n,\theta). Use this we can find joint distribution of (T,U) as \begin{equation} f(t,u|\theta)=\frac{2}{\Gamma(n)^2t}u^{2n-1}exp(-\frac{u\theta}{t}-\frac{ut}{\theta}),\quad u>0,t>0 \tag{8.40} \end{equation} Thus, the ionformation is then \begin{equation} -E(-\frac{2UT}{\theta^3})=E(\frac{2V}{\theta^3})=\frac{2n\theta}{\theta^3}=I(\theta) \tag{8.40} \end{equation} as we desired.

For part (iii), the joint pdf of samples is \begin{equation} f(\mathbf{x},\mathbf{y})=exp(-\theta(\sum x_i)-(\sum y_i)/\theta) \tag{8.41} \end{equation} Hence (W,V) is sufficient, so is (T,U) since they are a one-to-one function of (W,V).

However, since EU^2=E(WV)=(n/\theta)(n\theta)=n^2, we can define function g(t,u)=u^2-n^2 then E(g(\mathbf{T}))=0 for all \theta does not imply g(\mathbf{T})=0 almost surely. Thus, (T,U) is not complete.