Chapter 8 Homework 2: Principles of Data Reduction: Problems and Solutions
Exercise 8.2 (Casella and Berger 6.2) Let X1,⋯,Xn be independent random variables with densities fXi(x|θ)={eiθ−xx≥iθ0x<iθ
Prove that T=min is a sufficient statistic for \theta.Exercise 8.5 (Casella and Berger 6.9) For each of the following distributions let X_1,\cdots,X_n be a random sample. Find a minimal sufficient statistic for \theta.
Normal: f(x|\theta)=\frac{1}{\sqrt{2\pi}}e^{-(x-\theta)^2/2}, -\infty<x<\infty, -\infty<\theta<\infty.
Location exponential: f(x|\theta)=e^{-(x-\theta)}, \theta<x<\infty, -\infty<\theta<\infty.
Logistic: f(x|\theta)=\frac{e^{-(x-\theta)}}{(1+e^{-(x-\theta)})^2}, -\infty<x<\infty, -\infty<\theta<\infty.
Cauchy: f(x|\theta)=\frac{1}{\pi[1+(x-\theta)^2]}, -\infty<x<\infty, -\infty<\theta<\infty.
- Double exponential: f(x|\theta)=\frac{1}{2}e^{-|x-\theta|}, -\infty<x<\infty, -\infty<\theta<\infty.
Proof. (a) In this case, sample mean is the minimal sufficient statistic for \theta. Because \begin{equation} \begin{split} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}&= \frac{exp\{-\frac{1}{2}[\sum_{i=1}^n(x_i-\bar{x})^2+n(\bar{x}-\theta)^2]\}} {exp\{-\frac{1}{2}[\sum_{i=1}^n(y_i-\bar{y})^2+n(\bar{y}-\theta)^2]\}}\\ &=exp\{-\frac{1}{2}[(n-1)(S_x^2+S_y^2)+n(\bar{x}^2-\bar{y}^2)+2n\theta(\bar{x}-\bar{y})]\} \end{split} \tag{8.7} \end{equation} The ratio will be a constant w.r.t. \theta only when \bar{x}=\bar{y}.
In this case, \min_i\mathbf{X}=min\{X_1,\cdots,X_n\} is the minimal sufficient statistic. Becasue \begin{equation} \begin{split} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}&= \frac{exp\{n(\bar{x}-\theta)\}\prod_{i=1}^nI_{(\theta,\infty)(x_i)}} {exp\{n(\bar{y}-\theta)\}\prod_{i=1}^nI_{(\theta,\infty)(x_i)}} &=exp\{-n(\bar{x}-\bar{y})\}\frac{I_{(\theta,\infty)}(\min_i(x_i))}{I_{(\theta,\infty)}(\min_i(y_i))} \end{split} \tag{8.8} \end{equation} which is a constant w.r.t. \theta only when \min_i(x_i)=\min_i(y_i).
In this case, the order statistics are the minimal sufficient statistic. Because \begin{equation} \begin{split} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}&= \frac{\prod_{i=1}^ne^{-x_i}}{\prod_{i=1}^ne^{-y_i}} (\frac{\prod_{i=1}^n(1+e^{-y_i-\theta})}{\prod_{i=1}^n(1+e^{-x_i-\theta})})^2\\ &=\frac{\prod_{i=1}^ne^{-x_i}}{\prod_{i=1}^ne^{-y_i}} (\prod_{i=1}^n\frac{e^{-\theta}+e^{-y_i}}{e^{-\theta}+e^{-x_i}})^2 \end{split} \tag{8.9} \end{equation} which is a constant w.r.t. \theta only when order statistics for sample points \mathbf{x} and \mathbf{y} are equal.
In this case, the order statistics are the minimal sufficient statistic. Because \begin{equation} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}=\frac{\prod_{i=1}^n(1+(y_i-\theta)^2)}{\prod_{i=1}^n(1+(x_i-\theta)^2)} \tag{8.10} \end{equation} Hence, suppose two sample points \mathbf{x} and \mathbf{y} only differ at x_j and y_j, then from (8.10) we have \begin{equation} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}=\frac{1+(y_j-\theta)^2}{1+(x_j-\theta)^2} \tag{8.11} \end{equation} which is not a constant w.r.t. \theta. Hence in order to have it as a constant, we have to have the trivial case, i.e. order statistics, as the minimal sufficient statistics.
- In this case, the order statistics are the minimal sufficient statistic. Because \begin{equation} \begin{split} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}&=exp(\sum_{i=1}^n|y_i-\theta|-\sum_{i=1}^n|x_i-\theta|)\\ &=exp(\sum_{j\in\{j:y_{(j)}<\theta\}}(\theta-y_{(j)})+\sum_{j\in\{j:y_{(j)}\geq\theta\}}(y_{(j)}-\theta)\\ &-\sum_{j\in\{j:x_{(j)}<\theta\}}(\theta-x_{(j)})-\sum_{j\in\{j:x_{(j)}\geq\theta\}}(x_{(j)}-\theta)) \end{split} \tag{8.12} \end{equation} From (8.12) we can see if there is at least one order statistic in sample points \mathbf{x} and \mathbf{y} that are differ, (8.12) will be depended on \theta. Thus, order statistics are the minimal sufficient statistic.
Exercise 8.6 (Casella and Berger 6.12) A natural ancillary statistic in most problem is the sample size. For example, let N be a random variable taking values 1,2,\cdots with known probabilities p_1,p_2,\cdots, where \sum_{i=1}^{\infty}p_i=1. Having observed N=n, perform n Bernoulli trials with success probability \theta, getting X successes.
Prove that the pair (X,N) is minimal sufficient and N is ancillary for \theta.
- Prove that the estimator \frac{X}{N} is unbiased for \theta and has variance \theta(1-\theta)E(1/N).
Proof. (a) For minimal sufficiency, consider two pairs of statistics (x,n_1) and (y,n_2) corresponding to two sample points \mathbf{x} and \mathbf{y}. Then \begin{equation} \begin{split} \frac{p(x,n_1|\theta)}{p(y,n_2|\theta)}&=\frac{{n_1 \choose x}\theta^x(1-\theta)^{n_1-x}p_{n_1}}{{n_2 \choose y}\theta^y(1-\theta)^{n_2-y}p_{n_2}}\\ &\propto\theta^{x-y}(1\theta)^{(n_1-n_2)-(x-y)} \end{split} \tag{8.13} \end{equation} This is a constant w.r.t. \theta only when x=y and n_1=n_2. Thus, the minimal sufficiency is proved.
For ancillarity, consider the distribution of N, we have \begin{equation} f_N(n)=\sum_{x=0}^n{n \choose x}\theta^x(1-\theta)^{n-x}p_{n}=p_n \tag{8.14} \end{equation} Since the pdf of N does not depend on \theta, it is an ancillary statistic for \theta.
- Directly calculate the expectation we have \begin{equation} \begin{split} E(\frac{X}{N})&=\sum_{n=1}^{\infty}\sum_{x=0}^{n}\frac{x}{n}{n \choose x}\theta^x(1-\theta)^{n-x}p_{n}\\ &=\sum_{n=1}^{\infty}\frac{p_n}{n}\cdot \frac{n}{\theta}\\ &=\theta\sum_{n=1}^{\infty} p_n=\theta \end{split} \tag{8.15} \end{equation} Hence, this estimator is unbiased. For variance, we have \begin{equation} \begin{split} E((\frac{X}{N})^2)&=\sum_{n=1}^{\infty}\sum_{x=0}^{n}(\frac{x}{n})^2{n \choose x}\theta^x(1-\theta)^{n-x}p_{n}\\ &=\sum_{n=1}^{\infty}\frac{p_n}{n^2}\cdot (n\theta(1-\theta)+(n\theta)^2)\\ &=\theta(1-\theta)\sum_{n=1}^{\infty}\frac{p_n}{n}+\theta^2=\theta(1-\theta)E(\frac{1}{N})+\theta^2 \end{split} \tag{8.16} \end{equation} Hence, \begin{equation} Var(\frac{X}{N})=\theta(1-\theta)E(\frac{1}{N})+\theta^2-\theta^2=\theta(1-\theta)E(\frac{1}{N}) \tag{8.17} \end{equation} as we desired.
Exercise 8.7 (Casella and Berger 6.15) Let X_1,\cdots,X_n be i.i.d. N(\theta,a\theta^2) where a is a known constant and \theta>0.
Show that the parameter space does not contain a two-dimensional open set.
Show that the statistic T=(\bar{X},S^2) is a sufficient statistic for \theta, but the family of distributions is not complete.
Proof. (a) Since when mean parameter \theta is fixed, the variance parameter are also fixed as a\theta^2. Hence, the parameter space is a quadratic curve on the plane truncated at \theta=0. It does not contain any two-dimensional open set.
- Consider the joint pdf of X_1,\cdots,X_n, we have \begin{equation} \begin{split} f(x_1,\cdots,x_n)&=(2\pi)^{-n/2}(a\theta^2)^{-n/2}exp(-\frac{1}{2a\theta^2}\sum_{i=1}^n(x_i-\theta)^2)\\ &=(2\pi)^{-n/2}(a\theta^2)^{-n/2}exp(-\frac{1}{2a\theta^2}[(n-1)S^2+n(\bar{x}-\theta)^2]) \end{split} \tag{8.18} \end{equation} By Factorization theorem, T=(\bar{X},S^2) is a sufficient statistic for \theta.
Since E(S^2)=a\theta^2 and E(\bar{X}^2)=\frac{a\theta^2}{n}+\theta^2. Hence, E(\frac{an}{n+a}\bar{X}^2-S^2)=0. Hence, if we choose g(\mathbf{T})=\frac{an}{n+a}\bar{X}^2-S^2, then Eg(\mathbf{T})=0 but does necessarily have g(\mathbf{T})=0 almost surely. Thus, it is not complete.
Proof. Consider the joint pdf of X_1,\cdots,X_n, we have \begin{equation} f(x_1,\cdots,x_n)=(\frac{\theta}{1-\theta})^n(1-\theta)^{\sum_{i=1}^nx_i} \tag{8.19} \end{equation} By Factorization Theorem, take h(x)=1 we have \sum_{i=1}^nx_i is a sufficient statistic.
Intuitively, the sum of independent geometricly distributed random variables has a negative binominal distribution. \begin{equation} P(\sum_{i=1}^nX_i=m)={m-1 \choose n-1}\theta^n(1-\theta)^{m-n} \tag{8.20} \end{equation}
For completeness, using definition, consider the expectation of g(\mathbf{T}), we have \begin{equation} Eg(\mathbf{T})=\sum_{t=n}^{\infty}g(t){t-1 \choose n-1}(\frac{\theta}{1-\theta})^n(1-\theta)^t \tag{8.21} \end{equation} which equals 0 only when g(t)=0 becasue 0<\theta<1 and {t-1 \choose n-1}(\frac{\theta}{1-\theta})^n(1-\theta)^t>0,\forall t\in\mathbb{N} and t\geq n. Hence, \sum_{i=1}^nx_i is a complete statistic.Exercise 8.9 (Casella and Berger 6.22) Let X_1,\cdots,X_n be a random sample from a population with pdf \begin{equation} f(x|\theta)=\theta x^{\theta-1},0<x<1,\theta>0 \tag{8.22} \end{equation}
Is \sum_{i=1}^n X_i sufficient for \theta?
Find a complete sufficient statistic for \theta.
Proof. (a) Consider the joint pdf of X_1,\cdots,X_n, we have \begin{equation} f(x_1,\cdots,x_n)=\theta^n(\prod_{i=1}^nx_i)^{\theta-1} \tag{8.23} \end{equation} Thus, \prod_{i=1}^nx_i is a sufficient statistic, not \sum_{i=1}^n X_i.
- Notice that f(x|\theta)=\theta e^{(\theta-1)\log(x)} belongs to exponential family and the support contains open set in \mathbb{R}. Thus, by Theorem 6.2, \sum_{i=1}^n\log(X_i)=\log(\prod_{i=1}^nX_i) is a complete statistic. \log is an one-to-one transformation, hence \prod_{i=1}^nx_i is a complete statistic.
Exercise 8.10 (Casella and Berger 6.26) Use Minimal Sufficient Statistics Theorem (Theorem 8.1) to estabilish that given sample X_1,\cdots,X_n, the following statistics are minimal sufficient.
Distribution: N(\theta,1), Statistic: \bar{X}.
Distribution: Gamma(\alpha,\beta) with \alpha known, Statistic: \sum_{i=1}^nX_i.
Distribution: Unif(0,\theta), Statistic: \max_iX_i.
Distribution: Cauchy(\theta,1), Statistic: X_{(1)},\cdots,X_{(n)}.
- Distribution: logistic(\mu,\beta), Statistic: X_{(1)},\cdots,X_{(n)}.
Theorem 8.1 (Minimal Sufficient Statistics) Suppose that the family of densities \{f_0(\mathbf{x}),\cdots,f_k(\mathbf{x})\} all have common support. Then
The statistic T(\mathbf{X})=(\frac{f_1(\mathbf{X})}{f_0(\mathbf{X})},\frac{f_2(\mathbf{X})}{f_0(\mathbf{X})},\cdots,\frac{f_k(\mathbf{X})}{f_0(\mathbf{X})}) is minimal sufficient for the family \{f_0(\mathbf{x}),\cdots,f_k(\mathbf{x})\}.
If \mathcal{F} is a family of densities with common support, and
f_i(\mathbf{x})\in\mathcal{F},i=0,1,\cdots,k,
T(\mathbf{x}) is sufficient for \mathcal{F},
Exercise 8.11 (Casella and Berger 6.30) Let X_1,\cdots,X_n be a random sample from the pdf f(x|\mu)=e^{-(x-\mu)}, where -\infty<\mu<x<\infty.
Show that X_{(1)}=\min_{i}X_i is a complete sufficient statistic.
- Use Basu Theorem to show that X_{(1)} and S^2 are independent.
Proof. Consider the joint pdf of X_1,\cdots,X_n, \begin{equation} \begin{split} f(x_1,\cdots,x_n)&=\prod_{i=1}^nexp(-(x_i-\mu))I_{x_i>\mu}(x_i)\\ &=exp(-(\sum_{i=1}^nx_i-n\mu))I_{\min_ix_i>\mu}(\mathbf{x})\\ &=exp(-\sum_{i=1}^nx_i)exp(n\mu)I_{\min_ix_i>\mu}(\mathbf{x}) \end{split} \tag{8.24} \end{equation} Thus, by Factorization Theorem, X_{(1)}=\min_{i}X_i is a sufficient statistic.
As for completeness, consider the expectation of some function g(\mathbf{T}), we have \begin{equation} Eg(\mathbf{T})=\int_{\mu}^{\infty}ne^{-n(y-\mu)}dy \tag{8.25} \end{equation} If Eg(\mathbf{T})=0 then \int_{\mu}^{\infty}e^{-ny}dy=0 for all \mu. Taking derivatives w.r.t. \mu, we have \begin{equation} -g(\mu)e^{-n\mu}=0 \tag{8.26} \end{equation} for all \mu. Hence g(\mu)=0 for all \mu. X_{(1)} is also complete as we desire.
- By Basu Theorem, we only need to show that S^2 is an ancillary statistic. Becasue f(x|\mu) is a location family. We can write X_i=Z_i+\mu, where Z_i are samples from f(x|0). Also S^2=\frac{1}{n-1}\sum_{i=1}^n(Z_i-\bar{Z})^2, whose distribution does not depend on \mu. Hence, S^2 is ancillary.
Exercise 8.14 (Casella and Berger 6.36) One advantage of using a minimal sufficient statistic is that unbiased estimators will have smaller variance, as the following exercise will show. Suppose T_1 is sufficient and T_2 is minimal sufficient, U is an unbiased estimator of \theta, and define U_1=E(U|T_1) and U_2=E(U|T_2).
Show that U_2=E(U_1|T_2).
Now use the conditional variance formula to show that Var(U_2)\leq Var(U_1).
Proof. (a) Only consider discrete case here. Let us first proof the following lemma: if \mathbf{x} is a vector of ramdom variables with joint pmf f(\mathbf{x}), then the joint distribution f(\mathbf{x},g(\mathbf{x})) is the same as f(\mathbf{x}). This is straight forward for discrete random variables.
Noticing this lemma, assume samples \mathbf{X}=(X_1,\cdots,X_n) has joint pdf f(\mathbf{x}|\theta) and pdfs of statistic T_1 and T_2 are g_1(\cdot|\theta) and g_2(\cdot|\theta), respectively. Then the joint distribution of any statistic and \mathbf{X} is still f(\mathbf{x}|\theta) and the joint distribution of T_1 and T_2 is g_1(\cdot|\theta) since T_2 is minimal sufficient. Suppose T_2=r(T_1). Therefore, firstly we have \begin{equation} E(U|T_1=t)=E(U(\mathbf{x})|T_1(\mathbf{x})=t)=\sum_{\mathbf{x}\in\mathcal{X}}U(\mathbf{x})f_1(\mathbf{x}|T_1=t) \tag{8.30} \end{equation} Here f_1(\mathbf{x}|T_1) denote the conditional pdf of \mathbf{x} given T_1. Define A(t):=\{\mathbf{x}\in\mathcal{X}:T(\mathbf{x})=t\}. Then (8.30) can be written as \begin{equation} E(U|T_1=t_1)=\sum_{\mathbf{x}\in A(t_1)}U(\mathbf{x})\frac{f(\mathbf{x}|\theta)}{g_1(t_1|\theta)} \tag{8.31} \end{equation} And similarily, we have \begin{equation} E(U|T_2=t_2)=\sum_{\mathbf{x}\in B(t_2)}U(\mathbf{x})\frac{f(\mathbf{x}|\theta)}{g_2(t_2|\theta)} \tag{8.32} \end{equation} Furthermore, consider E(U_1|T_2) we have \begin{equation} \begin{split} E(U_1|T_2=t_2)&=\sum_{t_1\in C(t_2)}\sum_{\mathbf{x}\in A(t_1)}U(\mathbf{x})\frac{f(\mathbf{x}|\theta)}{g_1(t_1|\theta)}\frac{g_1(t_1|\theta)}{g_2(t_2|\theta)} &=\sum_{t_1\in C(t_2)}\sum_{\mathbf{x}\in A(t_1)}U(\mathbf{x})\frac{f(\mathbf{x}|\theta)}{g_2(t_2|\theta)} \end{split} \tag{8.33} \end{equation} with C(t_2):\{t(\mathbf{x}):r(t(\mathbf{x}))=t_2\}. Hence, we only left to show that \bigcup_{t1\in C(t_2)}A(t_1)=B(t_2) then we are done.
Consider sample point \mathbf{x}\in\mathcal{X}, for any \mathbf{x}\in\bigcup_{t1\in C(t_2)}A(t_1), then T_1(\mathbf{x})=t_1 and r(t_1)=t_2, hence T_2(\mathbf{x})=r(T_1(\mathbf{x}))=t_2. we have \mathbf{x}\in B(t_2). Therefore, \bigcup_{t1\in C(t_2)}A(t_1)\subseteq B(t_2). On the other hand, for any \mathbf{x}\in B(t_2), T_2(\mathbf{x})=t_2 which implies r(T_1(\mathbf{x}))=t_2. Denote T_1(\mathbf{x})=t_1^*. Then we have t_1^*\in C(t_2) and \mathbf{x}\in A(t_1^*). Hence we have proved \mathbf{x}\in \bigcup_{t_1^*\in C(t_2)}A(t_1^*)=\bigcup_{t1\in C(t_2)}A(t_1). Hence B(t_2)\subseteq\bigcup_{t1\in C(t_2)}A(t_1) and finally \bigcup_{t1\in C(t_2)}A(t_1)= B(t_2).
Thus, (8.33) and (8.32) gives same result and we have what we want.
- Since U_2=E(U_1|T_2), using law of total variance we have \begin{equation} Var(U_1)=Var(E(U_1|T_2))+E(Var(U_1|T_2))\geq Var(E(U_1|T_2))=Var(U_2) \tag{8.34} \end{equation}
Exercise 8.15 (Casella and Berger 6.37) Joshi and Nabar examine properties of linear estimators for the parameter in the so-called “Problem of the Nile”, where (X,Y) has the joint density \begin{equation} f(x,y|\theta)=exp\{-(\theta x+y/\theta)\},\quad x>0,y>0 \tag{8.35} \end{equation}
For an i.i.d. sample of size n, show that the Fisher information is I(\theta)=2n/\theta^2.
For the estimators T=\sqrt{\sum Y_i/\sum X_i} and U=\sqrt{\sum X_i\sum Y_i}, show that
the information in T alnoe is [2n/(2n+1)]I(\theta).
the information in (T,U) is I(\theta).
(T,U) is jointly sufficient but not complete.
Proof. (a) For one observation (X,Y) we have \begin{equation} I(\theta)=-E(\frac{\partial^2}{\partial\theta^2}\log f(X,Y|\theta))=\theta^{-3}E(2Y) \tag{8.36} \end{equation} Since Y\sim Exp(\theta), E(Y)=\theta. Hence, for one data point I(\theta)=2/\theta^2. Then for n data, I(\theta)=2n/\theta^2.
- The cdf of T is \begin{equation} \begin{split} P(T\leq t)&=P(\frac{\sum Y_i}{\sum X_i}\leq t^2)=P(\frac{2\sum Y_i/\theta}{2\sum X_i\theta}\leq t^2/\theta^2)\\ &=P(F_{2n,2n}\leq t^2/\theta^2) \end{split} \tag{8.37} \end{equation} The last equation holds becasue 2Y_i/\theta and 2X_i\theta are all independent Exp(1) which is also \chi_2^2. Thus, the density of T is \begin{equation} f_T(t)=\frac{\Gamma(2n)}{\Gamma{n}^2}\frac{2}{t}(\frac{t^2}{t^2+\theta^2})^n(\frac{\theta^2}{t^2+\theta^2})^n \tag{8.38} \end{equation} and the second derivatives w.r.t. \theta of the log density is \frac{2n}{\theta^2}(1-\frac{2}{(t^2/\theta^2+1)^2}) and hence the information in T is \frac{2n}{\theta^2}[1-2 E(\frac{1}{F_{2n,2n}^2+1})^2]. The expected value is \begin{equation} E(\frac{1}{F_{2n,2n}^2+1})^2=\frac{\Gamma(2n)}{\Gamma{n}^2}\int^{\infty}_0\frac{1}{(1+\omega)^2} \frac{\omega^{n-1}}{(1+\omega)^{2n}}d\omega=\frac{n+1}{2(2n+1)} \tag{8.39} \end{equation} Thus, the information of T is [2n/(2n+1)]I(\theta).
For part (ii), let W=\sum X_i and V=\sum Y_i. W and V are independent, and W\sim Gamma(n,\frac{1}{\theta}), V\sim Gamma(n,\theta). Use this we can find joint distribution of (T,U) as \begin{equation} f(t,u|\theta)=\frac{2}{\Gamma(n)^2t}u^{2n-1}exp(-\frac{u\theta}{t}-\frac{ut}{\theta}),\quad u>0,t>0 \tag{8.40} \end{equation} Thus, the ionformation is then \begin{equation} -E(-\frac{2UT}{\theta^3})=E(\frac{2V}{\theta^3})=\frac{2n\theta}{\theta^3}=I(\theta) \tag{8.40} \end{equation} as we desired.
For part (iii), the joint pdf of samples is \begin{equation} f(\mathbf{x},\mathbf{y})=exp(-\theta(\sum x_i)-(\sum y_i)/\theta) \tag{8.41} \end{equation} Hence (W,V) is sufficient, so is (T,U) since they are a one-to-one function of (W,V).
However, since EU^2=E(WV)=(n/\theta)(n\theta)=n^2, we can define function g(t,u)=u^2-n^2 then E(g(\mathbf{T}))=0 for all \theta does not imply g(\mathbf{T})=0 almost surely. Thus, (T,U) is not complete.