Chapter 10 Midterm 1: Chapter 5 and Chapter 6 on Casella and Berger (2002): Problems and Solutions

Exercise 10.1 Specify which of the following statements are true and which are false.

  1. If a minimal sufficient statistics, then it is unique. (2 pts)

  2. If T(X) is a minimal sufficient statistics, then T(X) is independent of any ancillary statistics. (2 pts)

  3. For a random sample X1,,Xn from the distribution f(x|θ) define the statistic T(X)=n. Then T is an ancillary statistics for θ. (2 pts)

  4. For a smaple X1,,Xn from a normal with mean θ and variance σ2, define ˜S2=ni=1(XiˉX)2n then n˜S2/σ2χ2n1. (2 pts)

Proof. (1) FALSE. Actually any one-to-one function of it will also be a minimal sufficent statistic.

  1. FALSE. By Basu Theorem, it still needs to be complete to have independence with ancillary statistics.

  2. TRUE. Exercise 6.12 from Casella and Berger (2002) says A natural ancillary statistic in most problems is the sample size. Also see Exercise 8.6.

  3. TRUE. Since (n1)S2σ2χ2n1, thus (n1)S2σ2χ2n1=ni=1(xiˉx)2σ2=n˜S2σ2 Thus, we have the statement is true.

Exercise 10.2 Let X1,,Xn be a random sample from a gamma distribution with parameters α>0 and β>0. Thus, fX(x|α,β)=βαΓ(α)xα1eβx,x>0

  1. Show that the gamma distribution is a member of the exponential family and find a two dimensional sufficient statistics for the parameters α and β. (3 pts)

  2. Is the statistic that you found above minimal sufficient? (2 pts)

  3. Is it complete? (2 pts)

Proof. (1) The definition of exponential family is f(x|\boldsymbol{\theta})=h(x)c(\boldsymbol{\theta})exp(\sum_{j=1}^k\omega_j(\boldsymbol{\theta})t_j(x)). The p.d.f. of gamma distribution can be written as f(x|\alpha,\beta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)}exp((\alpha-1)log(x)-\beta x). Hence, by choosing h(x)=1, c(\boldsymbol{\theta})=\frac{\beta^{\alpha}}{\Gamma(\alpha)} and \omega_1(\boldsymbol{\theta})t_1(x)=(\alpha-1)\log x, \omega_2(\boldsymbol{\theta})t_2(x)=-\beta x, we can see gamma distribution belongs to exponential family.

For two dimensional sufficient statistics, consider the joint p.d.f. of X_1,\cdots,X_n, we have \begin{equation} f(x_1,\cdots,x_n|\alpha,\beta)=(\frac{\beta^{\alpha}}{\Gamma(\alpha)})^n(\prod_{i=1}^nx_i)^{\alpha-1}e^{-\sum_{i=1}^nx_i} \tag{10.4} \end{equation} Then for statistics T=(\prod_{i=1}^nx_i,\sum_{i=1}^nx_i) is a sufficeint statistic for (\alpha,\beta).

  1. Consider two sample points \mathbf{X}=(X_1,X_2,\cdots,X_n) and \mathbf{Y}=(Y_1,Y_2,\cdots,Y_n), then \begin{equation} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}=\frac{(\frac{\beta^{\alpha}}{\Gamma(\alpha)})^n(\prod_{i=1}^nx_i)^{\alpha-1}e^{-\sum_{i=1}^nx_i}}{(\frac{\beta^{\alpha}}{\Gamma(\alpha)})^n(\prod_{i=1}^ny_i)^{\alpha-1}e^{-\sum_{i=1}^ny_i}}=(\frac{\prod_{i=1}^ns_i}{\prod_{i=1}^ny_i})^{\alpha-1}e^{-\beta(\sum_{i=1}^nx_i-\sum_{i=1}^ny_i)} \tag{10.5} \end{equation} This is a constant of \alpha and \beta only when \prod_{i=1}^nx_i=\prod_{i=1}^ny_i and \sum_{i=1}^nx_i=\sum_{i=1}^ny_i. Hence, T=(\prod_{i=1}^nX_i,\sum_{i=1}^nX_i) is minimal sufficient.

  2. By Theorem 6.2.25 on Casella and Berger (2002) (also see 6.2), for exponential family, we have T(X)=(\sum_{i=1}^nt_1(X_i),\cdots,\sum_{i=1}^nt_k(X_i)) is a complete statistic when the parameter space contains an open set in \mathbb{R}^k. In the case of gamma distribution, since the parameter space \alpha>0 and \beta>0 contains an open set in \mathbb{R}^2, apply this theorem we have T(X)=(\sum_{i=1}^n\log(X_i),\sum_{i=1}^nX_i)=(\log(\prod_{i=1}^nX_i),\sum_{i=1}^nX_i) is a complete statistic.

Since any one-to-one function of a complete statistic is still a complete statistic. f(x)=e^x is one-to-one, so define g(x,y)=(e^x,y), this is also a one-to-one function. Hence T=(\prod_{i=1}^nX_i,\sum_{i=1}^nX_i) is a complete statistic.

Do not forget to check every condition when applying any theorem!
Exercise 10.3 Let U_i, i=1,2,\cdots, be independent Unif(0,1) random variables, and let X have probability function \begin{equation} Pr(X=x)=\frac{1}{(e-1)x!},x=1,2,\cdots \tag{10.6} \end{equation} Find the distribution of Z=min\{U_1,\cdots,U_X\}. (7 pts)

Proof. Consider \begin{equation} \begin{split} Pr(Z\leq z|x)&=Pr(min\{U_1,\cdots,U_z\}\leq z|x)\\ &=1-Pr(\forall U_i>z,i=1,\cdots,x|x)\\ &=1-(1-z)^x \quad (0<z<1) \end{split} \tag{10.7} \end{equation}

Therefore, the p.d.f. of Z|X is (10.7) take derivatives w.r.t. z, which gives f(z|x)=x(1-z)^{x-1}. Thus, by law of total probability, \begin{equation} \begin{split} f(z)&=\sum_{x=1}^{\infty}f(z|x)f(x)\\ &=\sum_{x=1}^{\infty}x(1-z)^{x-1}\frac{1}{(e-1)x!}\\ &=\frac{1}{e-1}(\sum_{x=0}^{\infty}\frac{(1-z)^x}{x!})\\ &=\frac{e^{1-z}}{e-1} \quad (0<z<1) \end{split} \tag{10.8} \end{equation}

Thus, the p.d.f. of Z is \begin{equation} f(z)=\left\{\begin{aligned} &\frac{e^{1-z}}{e-1} & 0<z<1 \\ & 0 & o.w. \end{aligned} \right. \tag{10.9} \end{equation}

Exercise 10.4 Let X be one observation from the p.d.f. \begin{equation} f(x|\theta)=(\frac{\theta}{2})^{|x|}(1-\theta)^{1-|x|},x=-1,0,1,\theta\in[0,1] \tag{10.10} \end{equation}

  1. Is X a complete sufficient statistic? (3 pts)

  2. Is |X| a complete sufficient statistic? (3 pts)

  3. Does f(x|\theta) belong to the exponential family? (2 pts)

Proof. (1) It is sufficient becuse \frac{f(x|\theta)}{f(T(x)|\theta)}=\frac{f(x|\theta)}{f(x|\theta)}=1 which is a constant w.r.t. \theta.

For completeness, consider any function g(\cdot) such that Eg(X)=\sum_{x\in\{-1,0,1\}}(\frac{\theta}{2})^{|x|}(1-\theta)^{1-|x|}g(x)=0, we have \begin{equation} \begin{split} &(\frac{\theta}{2})^{|-1|}(1-\theta)^{1-|-1|}g(-1)+(\frac{\theta}{2})^{|0|}(1-\theta)^{1-|0|}g(0)+(\frac{\theta}{2})^{|1|}(1-\theta)^{1-|1|}g(1)\\ &=\frac{\theta}{2}(g(-1)+g(1))+(1-\theta)g(0)=0 \end{split} \tag{10.11} \end{equation} we can define, for example \begin{equation} g(x)=\left\{\begin{aligned} x & \quad x=-1,0,1 \\ 0 & \quad o.w. \end{aligned} \right. \tag{10.12} \end{equation} Then Eg(x)=0 but p(g(x)=0)\neq 1. It is not a complete statistics.

  1. In terms of indicator function, we can write the p.m.f. of X as \begin{equation} f(x|\theta)=\frac{\theta}{2}I_{\{-1,1\}}(x)+(1-\theta)I_{\{0\}}(x) \tag{10.13} \end{equation} and thus the p.m.f. of |X| is \begin{equation} f(|x||\theta)=\frac{\theta}{2}I_{\{1\}}(|x|)+(1-\theta)I_{\{0\}}(|x|)=\frac{\theta}{2}I_{\{-1,1\}}(x)+(1-\theta)I_{\{0\}}(x) \tag{10.14} \end{equation} Hence f(x|\theta)=f(|x||\theta)\cdot 1, by factorization theorem we have |X| is sufficient.

For completeness, consider any function g(\cdot) such that \begin{equation} E(g(x))=\frac{\theta}{2}(g(1)+g(1))+(1-\theta)g(0)=\theta g(1)+(1-\theta)g(0)=0 \tag{10.15} \end{equation}

To have (10.15) holds for any \theta, it is a constant w.r.t. \theta, so take derivatives w.r.t. \theta should be 0, therefore we obtain g(1)-g(0)=0 or g(1)=g(0). Substitute in (10.15) we have g(1)=g(0)=0. Thus, g(|x|)=0,\forall x and |X| is complete.

  1. Rewrite the p.d.f. of X we have \begin{equation} f(x|\theta)=I_{\{-1,0,1\}}(x)|x|(1-|x|)\exp(\log(\frac{\theta}{2}(1-\theta))),x\in\mathbb{R},\theta\in[0,1] \tag{10.16} \end{equation} The definition of exponential family is f(x|\boldsymbol{\theta})=h(x)c(\boldsymbol{\theta})exp(\sum_{j=1}^k\omega_j(\boldsymbol{\theta})t_j(x)). If we define h(x)=I_{\{-1,0,1\}}(x)|x|(1-|x|), c(\boldsymbol{\theta})=t_1(x)=1 and \omega_1(\boldsymbol{\theta})=\log(\frac{\theta}{2}(1-\theta)), we will see that this distribution belongs to the exponential family.

References

Casella, George, and Roger Berger. 2002. Statistical Inference. 2nd ed. Belmont, CA: Duxbury Resource Center.