Chapter 10 Midterm 1: Chapter 5 and Chapter 6 on Casella and Berger (2002): Problems and Solutions
Exercise 10.1 Specify which of the following statements are true and which are false.
If a minimal sufficient statistics, then it is unique. (2 pts)
If T(X) is a minimal sufficient statistics, then T(X) is independent of any ancillary statistics. (2 pts)
For a random sample X1,⋯,Xn from the distribution f(x|θ) define the statistic T(X)=n. Then T is an ancillary statistics for θ. (2 pts)
- For a smaple X1,⋯,Xn from a normal with mean θ and variance σ2, define ˜S2=∑ni=1(Xi−ˉX)2n then n˜S2/σ2∼χ2n−1. (2 pts)
Proof. (1) FALSE. Actually any one-to-one function of it will also be a minimal sufficent statistic.
FALSE. By Basu Theorem, it still needs to be complete to have independence with ancillary statistics.
TRUE. Exercise 6.12 from Casella and Berger (2002) says A natural ancillary statistic in most problems is the sample size. Also see Exercise 8.6.
- TRUE. Since (n−1)S2σ2∼χ2n−1, thus (n−1)S2σ2∼χ2n−1=∑ni=1(xi−ˉx)2σ2=n˜S2σ2 Thus, we have the statement is true.
Exercise 10.2 Let X1,⋯,Xn be a random sample from a gamma distribution with parameters α>0 and β>0. Thus, fX(x|α,β)=βαΓ(α)xα−1e−βx,x>0
Show that the gamma distribution is a member of the exponential family and find a two dimensional sufficient statistics for the parameters α and β. (3 pts)
Is the statistic that you found above minimal sufficient? (2 pts)
- Is it complete? (2 pts)
Proof. (1) The definition of exponential family is f(x|\boldsymbol{\theta})=h(x)c(\boldsymbol{\theta})exp(\sum_{j=1}^k\omega_j(\boldsymbol{\theta})t_j(x)). The p.d.f. of gamma distribution can be written as f(x|\alpha,\beta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)}exp((\alpha-1)log(x)-\beta x). Hence, by choosing h(x)=1, c(\boldsymbol{\theta})=\frac{\beta^{\alpha}}{\Gamma(\alpha)} and \omega_1(\boldsymbol{\theta})t_1(x)=(\alpha-1)\log x, \omega_2(\boldsymbol{\theta})t_2(x)=-\beta x, we can see gamma distribution belongs to exponential family.
For two dimensional sufficient statistics, consider the joint p.d.f. of X_1,\cdots,X_n, we have \begin{equation} f(x_1,\cdots,x_n|\alpha,\beta)=(\frac{\beta^{\alpha}}{\Gamma(\alpha)})^n(\prod_{i=1}^nx_i)^{\alpha-1}e^{-\sum_{i=1}^nx_i} \tag{10.4} \end{equation} Then for statistics T=(\prod_{i=1}^nx_i,\sum_{i=1}^nx_i) is a sufficeint statistic for (\alpha,\beta).
Consider two sample points \mathbf{X}=(X_1,X_2,\cdots,X_n) and \mathbf{Y}=(Y_1,Y_2,\cdots,Y_n), then \begin{equation} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}=\frac{(\frac{\beta^{\alpha}}{\Gamma(\alpha)})^n(\prod_{i=1}^nx_i)^{\alpha-1}e^{-\sum_{i=1}^nx_i}}{(\frac{\beta^{\alpha}}{\Gamma(\alpha)})^n(\prod_{i=1}^ny_i)^{\alpha-1}e^{-\sum_{i=1}^ny_i}}=(\frac{\prod_{i=1}^ns_i}{\prod_{i=1}^ny_i})^{\alpha-1}e^{-\beta(\sum_{i=1}^nx_i-\sum_{i=1}^ny_i)} \tag{10.5} \end{equation} This is a constant of \alpha and \beta only when \prod_{i=1}^nx_i=\prod_{i=1}^ny_i and \sum_{i=1}^nx_i=\sum_{i=1}^ny_i. Hence, T=(\prod_{i=1}^nX_i,\sum_{i=1}^nX_i) is minimal sufficient.
By Theorem 6.2.25 on Casella and Berger (2002) (also see 6.2), for exponential family, we have T(X)=(\sum_{i=1}^nt_1(X_i),\cdots,\sum_{i=1}^nt_k(X_i)) is a complete statistic when the parameter space contains an open set in \mathbb{R}^k. In the case of gamma distribution, since the parameter space \alpha>0 and \beta>0 contains an open set in \mathbb{R}^2, apply this theorem we have T(X)=(\sum_{i=1}^n\log(X_i),\sum_{i=1}^nX_i)=(\log(\prod_{i=1}^nX_i),\sum_{i=1}^nX_i) is a complete statistic.
Since any one-to-one function of a complete statistic is still a complete statistic. f(x)=e^x is one-to-one, so define g(x,y)=(e^x,y), this is also a one-to-one function. Hence T=(\prod_{i=1}^nX_i,\sum_{i=1}^nX_i) is a complete statistic.
Proof. Consider \begin{equation} \begin{split} Pr(Z\leq z|x)&=Pr(min\{U_1,\cdots,U_z\}\leq z|x)\\ &=1-Pr(\forall U_i>z,i=1,\cdots,x|x)\\ &=1-(1-z)^x \quad (0<z<1) \end{split} \tag{10.7} \end{equation}
Therefore, the p.d.f. of Z|X is (10.7) take derivatives w.r.t. z, which gives f(z|x)=x(1-z)^{x-1}. Thus, by law of total probability, \begin{equation} \begin{split} f(z)&=\sum_{x=1}^{\infty}f(z|x)f(x)\\ &=\sum_{x=1}^{\infty}x(1-z)^{x-1}\frac{1}{(e-1)x!}\\ &=\frac{1}{e-1}(\sum_{x=0}^{\infty}\frac{(1-z)^x}{x!})\\ &=\frac{e^{1-z}}{e-1} \quad (0<z<1) \end{split} \tag{10.8} \end{equation}
Thus, the p.d.f. of Z is \begin{equation} f(z)=\left\{\begin{aligned} &\frac{e^{1-z}}{e-1} & 0<z<1 \\ & 0 & o.w. \end{aligned} \right. \tag{10.9} \end{equation}Exercise 10.4 Let X be one observation from the p.d.f. \begin{equation} f(x|\theta)=(\frac{\theta}{2})^{|x|}(1-\theta)^{1-|x|},x=-1,0,1,\theta\in[0,1] \tag{10.10} \end{equation}
Is X a complete sufficient statistic? (3 pts)
Is |X| a complete sufficient statistic? (3 pts)
- Does f(x|\theta) belong to the exponential family? (2 pts)
Proof. (1) It is sufficient becuse \frac{f(x|\theta)}{f(T(x)|\theta)}=\frac{f(x|\theta)}{f(x|\theta)}=1 which is a constant w.r.t. \theta.
For completeness, consider any function g(\cdot) such that Eg(X)=\sum_{x\in\{-1,0,1\}}(\frac{\theta}{2})^{|x|}(1-\theta)^{1-|x|}g(x)=0, we have \begin{equation} \begin{split} &(\frac{\theta}{2})^{|-1|}(1-\theta)^{1-|-1|}g(-1)+(\frac{\theta}{2})^{|0|}(1-\theta)^{1-|0|}g(0)+(\frac{\theta}{2})^{|1|}(1-\theta)^{1-|1|}g(1)\\ &=\frac{\theta}{2}(g(-1)+g(1))+(1-\theta)g(0)=0 \end{split} \tag{10.11} \end{equation} we can define, for example \begin{equation} g(x)=\left\{\begin{aligned} x & \quad x=-1,0,1 \\ 0 & \quad o.w. \end{aligned} \right. \tag{10.12} \end{equation} Then Eg(x)=0 but p(g(x)=0)\neq 1. It is not a complete statistics.
- In terms of indicator function, we can write the p.m.f. of X as \begin{equation} f(x|\theta)=\frac{\theta}{2}I_{\{-1,1\}}(x)+(1-\theta)I_{\{0\}}(x) \tag{10.13} \end{equation} and thus the p.m.f. of |X| is \begin{equation} f(|x||\theta)=\frac{\theta}{2}I_{\{1\}}(|x|)+(1-\theta)I_{\{0\}}(|x|)=\frac{\theta}{2}I_{\{-1,1\}}(x)+(1-\theta)I_{\{0\}}(x) \tag{10.14} \end{equation} Hence f(x|\theta)=f(|x||\theta)\cdot 1, by factorization theorem we have |X| is sufficient.
For completeness, consider any function g(\cdot) such that \begin{equation} E(g(x))=\frac{\theta}{2}(g(1)+g(1))+(1-\theta)g(0)=\theta g(1)+(1-\theta)g(0)=0 \tag{10.15} \end{equation}
To have (10.15) holds for any \theta, it is a constant w.r.t. \theta, so take derivatives w.r.t. \theta should be 0, therefore we obtain g(1)-g(0)=0 or g(1)=g(0). Substitute in (10.15) we have g(1)=g(0)=0. Thus, g(|x|)=0,\forall x and |X| is complete.
- Rewrite the p.d.f. of X we have \begin{equation} f(x|\theta)=I_{\{-1,0,1\}}(x)|x|(1-|x|)\exp(\log(\frac{\theta}{2}(1-\theta))),x\in\mathbb{R},\theta\in[0,1] \tag{10.16} \end{equation} The definition of exponential family is f(x|\boldsymbol{\theta})=h(x)c(\boldsymbol{\theta})exp(\sum_{j=1}^k\omega_j(\boldsymbol{\theta})t_j(x)). If we define h(x)=I_{\{-1,0,1\}}(x)|x|(1-|x|), c(\boldsymbol{\theta})=t_1(x)=1 and \omega_1(\boldsymbol{\theta})=\log(\frac{\theta}{2}(1-\theta)), we will see that this distribution belongs to the exponential family.
References
Casella, George, and Roger Berger. 2002. Statistical Inference. 2nd ed. Belmont, CA: Duxbury Resource Center.