Chapter 10 Midterm 1: Chapter 5 and Chapter 6 on Casella and Berger (2002): Problems and Solutions

Exercise 10.1 Specify which of the following statements are true and which are false.

  1. If a minimal sufficient statistics, then it is unique. (2 pts)

  2. If \(T(\mathbf{X})\) is a minimal sufficient statistics, then \(T(\mathbf{X})\) is independent of any ancillary statistics. (2 pts)

  3. For a random sample \(X_1,\cdots,X_n\) from the distribution \(f(x|\theta)\) define the statistic \(T(\mathbf{X})=n\). Then T is an ancillary statistics for \(\theta\). (2 pts)

  4. For a smaple \(X_1,\cdots,X_n\) from a normal with mean \(\theta\) and variance \(\sigma^2\), define \[\begin{equation} \tilde{S}^2=\frac{\sum_{i=1}^n(X_i-\bar{X})^2}{n} \tag{10.1} \end{equation}\] then \(n\tilde{S}^2/\sigma^2\sim\chi_{n-1}^2\). (2 pts)

Proof. (1) FALSE. Actually any one-to-one function of it will also be a minimal sufficent statistic.

  1. FALSE. By Basu Theorem, it still needs to be complete to have independence with ancillary statistics.

  2. TRUE. Exercise 6.12 from Casella and Berger (2002) says A natural ancillary statistic in most problems is the sample size. Also see Exercise 8.6.

  3. TRUE. Since \(\frac{(n-1)S^2}{\sigma^2}\sim\chi_{n-1}^2\), thus \[\begin{equation} \frac{(n-1)S^2}{\sigma^2}\sim\chi_{n-1}^2=\frac{\sum_{i=1}^n(x_i-\bar{x})^2}{\sigma^2}=\frac{n\tilde{S}^2}{\sigma^2} \tag{10.2} \end{equation}\] Thus, we have the statement is true.

Exercise 10.2 Let \(X_1,\cdots,X_n\) be a random sample from a gamma distribution with parameters \(\alpha>0\) and \(\beta>0\). Thus, \[\begin{equation} f_X(x|\alpha,\beta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x},x>0 \tag{10.3} \end{equation}\]

  1. Show that the gamma distribution is a member of the exponential family and find a two dimensional sufficient statistics for the parameters \(\alpha\) and \(\beta\). (3 pts)

  2. Is the statistic that you found above minimal sufficient? (2 pts)

  3. Is it complete? (2 pts)

Proof. (1) The definition of exponential family is \(f(x|\boldsymbol{\theta})=h(x)c(\boldsymbol{\theta})exp(\sum_{j=1}^k\omega_j(\boldsymbol{\theta})t_j(x))\). The p.d.f. of gamma distribution can be written as \(f(x|\alpha,\beta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)}exp((\alpha-1)log(x)-\beta x)\). Hence, by choosing \(h(x)=1\), \(c(\boldsymbol{\theta})=\frac{\beta^{\alpha}}{\Gamma(\alpha)}\) and \(\omega_1(\boldsymbol{\theta})t_1(x)=(\alpha-1)\log x\), \(\omega_2(\boldsymbol{\theta})t_2(x)=-\beta x\), we can see gamma distribution belongs to exponential family.

For two dimensional sufficient statistics, consider the joint p.d.f. of \(X_1,\cdots,X_n\), we have \[\begin{equation} f(x_1,\cdots,x_n|\alpha,\beta)=(\frac{\beta^{\alpha}}{\Gamma(\alpha)})^n(\prod_{i=1}^nx_i)^{\alpha-1}e^{-\sum_{i=1}^nx_i} \tag{10.4} \end{equation}\] Then for statistics \(T=(\prod_{i=1}^nx_i,\sum_{i=1}^nx_i)\) is a sufficeint statistic for \((\alpha,\beta)\).

  1. Consider two sample points \(\mathbf{X}=(X_1,X_2,\cdots,X_n)\) and \(\mathbf{Y}=(Y_1,Y_2,\cdots,Y_n)\), then \[\begin{equation} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}=\frac{(\frac{\beta^{\alpha}}{\Gamma(\alpha)})^n(\prod_{i=1}^nx_i)^{\alpha-1}e^{-\sum_{i=1}^nx_i}}{(\frac{\beta^{\alpha}}{\Gamma(\alpha)})^n(\prod_{i=1}^ny_i)^{\alpha-1}e^{-\sum_{i=1}^ny_i}}=(\frac{\prod_{i=1}^ns_i}{\prod_{i=1}^ny_i})^{\alpha-1}e^{-\beta(\sum_{i=1}^nx_i-\sum_{i=1}^ny_i)} \tag{10.5} \end{equation}\] This is a constant of \(\alpha\) and \(\beta\) only when \(\prod_{i=1}^nx_i=\prod_{i=1}^ny_i\) and \(\sum_{i=1}^nx_i=\sum_{i=1}^ny_i\). Hence, \(T=(\prod_{i=1}^nX_i,\sum_{i=1}^nX_i)\) is minimal sufficient.

  2. By Theorem 6.2.25 on Casella and Berger (2002) (also see 6.2), for exponential family, we have \(T(X)=(\sum_{i=1}^nt_1(X_i),\cdots,\sum_{i=1}^nt_k(X_i))\) is a complete statistic when the parameter space contains an open set in \(\mathbb{R}^k\). In the case of gamma distribution, since the parameter space \(\alpha>0\) and \(\beta>0\) contains an open set in \(\mathbb{R}^2\), apply this theorem we have \(T(X)=(\sum_{i=1}^n\log(X_i),\sum_{i=1}^nX_i)=(\log(\prod_{i=1}^nX_i),\sum_{i=1}^nX_i)\) is a complete statistic.

Since any one-to-one function of a complete statistic is still a complete statistic. \(f(x)=e^x\) is one-to-one, so define \(g(x,y)=(e^x,y)\), this is also a one-to-one function. Hence \(T=(\prod_{i=1}^nX_i,\sum_{i=1}^nX_i)\) is a complete statistic.

Do not forget to check every condition when applying any theorem!
Exercise 10.3 Let \(U_i\), \(i=1,2,\cdots\), be independent \(Unif(0,1)\) random variables, and let \(X\) have probability function \[\begin{equation} Pr(X=x)=\frac{1}{(e-1)x!},x=1,2,\cdots \tag{10.6} \end{equation}\] Find the distribution of \(Z=min\{U_1,\cdots,U_X\}\). (7 pts)

Proof. Consider \[\begin{equation} \begin{split} Pr(Z\leq z|x)&=Pr(min\{U_1,\cdots,U_z\}\leq z|x)\\ &=1-Pr(\forall U_i>z,i=1,\cdots,x|x)\\ &=1-(1-z)^x \quad (0<z<1) \end{split} \tag{10.7} \end{equation}\]

Therefore, the p.d.f. of \(Z|X\) is (10.7) take derivatives w.r.t. \(z\), which gives \(f(z|x)=x(1-z)^{x-1}\). Thus, by law of total probability, \[\begin{equation} \begin{split} f(z)&=\sum_{x=1}^{\infty}f(z|x)f(x)\\ &=\sum_{x=1}^{\infty}x(1-z)^{x-1}\frac{1}{(e-1)x!}\\ &=\frac{1}{e-1}(\sum_{x=0}^{\infty}\frac{(1-z)^x}{x!})\\ &=\frac{e^{1-z}}{e-1} \quad (0<z<1) \end{split} \tag{10.8} \end{equation}\]

Thus, the p.d.f. of \(Z\) is \[\begin{equation} f(z)=\left\{\begin{aligned} &\frac{e^{1-z}}{e-1} & 0<z<1 \\ & 0 & o.w. \end{aligned} \right. \tag{10.9} \end{equation}\]

Exercise 10.4 Let \(X\) be one observation from the p.d.f. \[\begin{equation} f(x|\theta)=(\frac{\theta}{2})^{|x|}(1-\theta)^{1-|x|},x=-1,0,1,\theta\in[0,1] \tag{10.10} \end{equation}\]

  1. Is \(X\) a complete sufficient statistic? (3 pts)

  2. Is \(|X|\) a complete sufficient statistic? (3 pts)

  3. Does \(f(x|\theta)\) belong to the exponential family? (2 pts)

Proof. (1) It is sufficient becuse \(\frac{f(x|\theta)}{f(T(x)|\theta)}=\frac{f(x|\theta)}{f(x|\theta)}=1\) which is a constant w.r.t. \(\theta\).

For completeness, consider any function \(g(\cdot)\) such that \(Eg(X)=\sum_{x\in\{-1,0,1\}}(\frac{\theta}{2})^{|x|}(1-\theta)^{1-|x|}g(x)=0\), we have \[\begin{equation} \begin{split} &(\frac{\theta}{2})^{|-1|}(1-\theta)^{1-|-1|}g(-1)+(\frac{\theta}{2})^{|0|}(1-\theta)^{1-|0|}g(0)+(\frac{\theta}{2})^{|1|}(1-\theta)^{1-|1|}g(1)\\ &=\frac{\theta}{2}(g(-1)+g(1))+(1-\theta)g(0)=0 \end{split} \tag{10.11} \end{equation}\] we can define, for example \[\begin{equation} g(x)=\left\{\begin{aligned} x & \quad x=-1,0,1 \\ 0 & \quad o.w. \end{aligned} \right. \tag{10.12} \end{equation}\] Then \(Eg(x)=0\) but \(p(g(x)=0)\neq 1\). It is not a complete statistics.

  1. In terms of indicator function, we can write the p.m.f. of \(X\) as \[\begin{equation} f(x|\theta)=\frac{\theta}{2}I_{\{-1,1\}}(x)+(1-\theta)I_{\{0\}}(x) \tag{10.13} \end{equation}\] and thus the p.m.f. of \(|X|\) is \[\begin{equation} f(|x||\theta)=\frac{\theta}{2}I_{\{1\}}(|x|)+(1-\theta)I_{\{0\}}(|x|)=\frac{\theta}{2}I_{\{-1,1\}}(x)+(1-\theta)I_{\{0\}}(x) \tag{10.14} \end{equation}\] Hence \(f(x|\theta)=f(|x||\theta)\cdot 1\), by factorization theorem we have \(|X|\) is sufficient.

For completeness, consider any function \(g(\cdot)\) such that \[\begin{equation} E(g(x))=\frac{\theta}{2}(g(1)+g(1))+(1-\theta)g(0)=\theta g(1)+(1-\theta)g(0)=0 \tag{10.15} \end{equation}\]

To have (10.15) holds for any \(\theta\), it is a constant w.r.t. \(\theta\), so take derivatives w.r.t. \(\theta\) should be 0, therefore we obtain \(g(1)-g(0)=0\) or \(g(1)=g(0)\). Substitute in (10.15) we have \(g(1)=g(0)=0\). Thus, \(g(|x|)=0,\forall x\) and \(|X|\) is complete.

  1. Rewrite the p.d.f. of \(X\) we have \[\begin{equation} f(x|\theta)=I_{\{-1,0,1\}}(x)|x|(1-|x|)\exp(\log(\frac{\theta}{2}(1-\theta))),x\in\mathbb{R},\theta\in[0,1] \tag{10.16} \end{equation}\] The definition of exponential family is \(f(x|\boldsymbol{\theta})=h(x)c(\boldsymbol{\theta})exp(\sum_{j=1}^k\omega_j(\boldsymbol{\theta})t_j(x))\). If we define \(h(x)=I_{\{-1,0,1\}}(x)|x|(1-|x|)\), \(c(\boldsymbol{\theta})=t_1(x)=1\) and \(\omega_1(\boldsymbol{\theta})=\log(\frac{\theta}{2}(1-\theta))\), we will see that this distribution belongs to the exponential family.

References

Casella, George, and Roger Berger. 2002. Statistical Inference. 2nd ed. Belmont, CA: Duxbury Resource Center.