Chapter 10 Midterm 1: Chapter 5 and Chapter 6 on Casella and Berger (2002): Problems and Solutions
Exercise 10.1 Specify which of the following statements are true and which are false.
If a minimal sufficient statistics, then it is unique. (2 pts)
If \(T(\mathbf{X})\) is a minimal sufficient statistics, then \(T(\mathbf{X})\) is independent of any ancillary statistics. (2 pts)
For a random sample \(X_1,\cdots,X_n\) from the distribution \(f(x|\theta)\) define the statistic \(T(\mathbf{X})=n\). Then T is an ancillary statistics for \(\theta\). (2 pts)
- For a smaple \(X_1,\cdots,X_n\) from a normal with mean \(\theta\) and variance \(\sigma^2\), define \[\begin{equation} \tilde{S}^2=\frac{\sum_{i=1}^n(X_i-\bar{X})^2}{n} \tag{10.1} \end{equation}\] then \(n\tilde{S}^2/\sigma^2\sim\chi_{n-1}^2\). (2 pts)
Proof. (1) FALSE. Actually any one-to-one function of it will also be a minimal sufficent statistic.
FALSE. By Basu Theorem, it still needs to be complete to have independence with ancillary statistics.
TRUE. Exercise 6.12 from Casella and Berger (2002) says A natural ancillary statistic in most problems is the sample size. Also see Exercise 8.6.
- TRUE. Since \(\frac{(n-1)S^2}{\sigma^2}\sim\chi_{n-1}^2\), thus \[\begin{equation} \frac{(n-1)S^2}{\sigma^2}\sim\chi_{n-1}^2=\frac{\sum_{i=1}^n(x_i-\bar{x})^2}{\sigma^2}=\frac{n\tilde{S}^2}{\sigma^2} \tag{10.2} \end{equation}\] Thus, we have the statement is true.
Exercise 10.2 Let \(X_1,\cdots,X_n\) be a random sample from a gamma distribution with parameters \(\alpha>0\) and \(\beta>0\). Thus, \[\begin{equation} f_X(x|\alpha,\beta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)}x^{\alpha-1}e^{-\beta x},x>0 \tag{10.3} \end{equation}\]
Show that the gamma distribution is a member of the exponential family and find a two dimensional sufficient statistics for the parameters \(\alpha\) and \(\beta\). (3 pts)
Is the statistic that you found above minimal sufficient? (2 pts)
- Is it complete? (2 pts)
Proof. (1) The definition of exponential family is \(f(x|\boldsymbol{\theta})=h(x)c(\boldsymbol{\theta})exp(\sum_{j=1}^k\omega_j(\boldsymbol{\theta})t_j(x))\). The p.d.f. of gamma distribution can be written as \(f(x|\alpha,\beta)=\frac{\beta^{\alpha}}{\Gamma(\alpha)}exp((\alpha-1)log(x)-\beta x)\). Hence, by choosing \(h(x)=1\), \(c(\boldsymbol{\theta})=\frac{\beta^{\alpha}}{\Gamma(\alpha)}\) and \(\omega_1(\boldsymbol{\theta})t_1(x)=(\alpha-1)\log x\), \(\omega_2(\boldsymbol{\theta})t_2(x)=-\beta x\), we can see gamma distribution belongs to exponential family.
For two dimensional sufficient statistics, consider the joint p.d.f. of \(X_1,\cdots,X_n\), we have \[\begin{equation} f(x_1,\cdots,x_n|\alpha,\beta)=(\frac{\beta^{\alpha}}{\Gamma(\alpha)})^n(\prod_{i=1}^nx_i)^{\alpha-1}e^{-\sum_{i=1}^nx_i} \tag{10.4} \end{equation}\] Then for statistics \(T=(\prod_{i=1}^nx_i,\sum_{i=1}^nx_i)\) is a sufficeint statistic for \((\alpha,\beta)\).
Consider two sample points \(\mathbf{X}=(X_1,X_2,\cdots,X_n)\) and \(\mathbf{Y}=(Y_1,Y_2,\cdots,Y_n)\), then \[\begin{equation} \frac{f(\mathbf{x}|\theta)}{f(\mathbf{y}|\theta)}=\frac{(\frac{\beta^{\alpha}}{\Gamma(\alpha)})^n(\prod_{i=1}^nx_i)^{\alpha-1}e^{-\sum_{i=1}^nx_i}}{(\frac{\beta^{\alpha}}{\Gamma(\alpha)})^n(\prod_{i=1}^ny_i)^{\alpha-1}e^{-\sum_{i=1}^ny_i}}=(\frac{\prod_{i=1}^ns_i}{\prod_{i=1}^ny_i})^{\alpha-1}e^{-\beta(\sum_{i=1}^nx_i-\sum_{i=1}^ny_i)} \tag{10.5} \end{equation}\] This is a constant of \(\alpha\) and \(\beta\) only when \(\prod_{i=1}^nx_i=\prod_{i=1}^ny_i\) and \(\sum_{i=1}^nx_i=\sum_{i=1}^ny_i\). Hence, \(T=(\prod_{i=1}^nX_i,\sum_{i=1}^nX_i)\) is minimal sufficient.
By Theorem 6.2.25 on Casella and Berger (2002) (also see 6.2), for exponential family, we have \(T(X)=(\sum_{i=1}^nt_1(X_i),\cdots,\sum_{i=1}^nt_k(X_i))\) is a complete statistic when the parameter space contains an open set in \(\mathbb{R}^k\). In the case of gamma distribution, since the parameter space \(\alpha>0\) and \(\beta>0\) contains an open set in \(\mathbb{R}^2\), apply this theorem we have \(T(X)=(\sum_{i=1}^n\log(X_i),\sum_{i=1}^nX_i)=(\log(\prod_{i=1}^nX_i),\sum_{i=1}^nX_i)\) is a complete statistic.
Since any one-to-one function of a complete statistic is still a complete statistic. \(f(x)=e^x\) is one-to-one, so define \(g(x,y)=(e^x,y)\), this is also a one-to-one function. Hence \(T=(\prod_{i=1}^nX_i,\sum_{i=1}^nX_i)\) is a complete statistic.
Proof. Consider \[\begin{equation} \begin{split} Pr(Z\leq z|x)&=Pr(min\{U_1,\cdots,U_z\}\leq z|x)\\ &=1-Pr(\forall U_i>z,i=1,\cdots,x|x)\\ &=1-(1-z)^x \quad (0<z<1) \end{split} \tag{10.7} \end{equation}\]
Therefore, the p.d.f. of \(Z|X\) is (10.7) take derivatives w.r.t. \(z\), which gives \(f(z|x)=x(1-z)^{x-1}\). Thus, by law of total probability, \[\begin{equation} \begin{split} f(z)&=\sum_{x=1}^{\infty}f(z|x)f(x)\\ &=\sum_{x=1}^{\infty}x(1-z)^{x-1}\frac{1}{(e-1)x!}\\ &=\frac{1}{e-1}(\sum_{x=0}^{\infty}\frac{(1-z)^x}{x!})\\ &=\frac{e^{1-z}}{e-1} \quad (0<z<1) \end{split} \tag{10.8} \end{equation}\]
Thus, the p.d.f. of \(Z\) is \[\begin{equation} f(z)=\left\{\begin{aligned} &\frac{e^{1-z}}{e-1} & 0<z<1 \\ & 0 & o.w. \end{aligned} \right. \tag{10.9} \end{equation}\]Exercise 10.4 Let \(X\) be one observation from the p.d.f. \[\begin{equation} f(x|\theta)=(\frac{\theta}{2})^{|x|}(1-\theta)^{1-|x|},x=-1,0,1,\theta\in[0,1] \tag{10.10} \end{equation}\]
Is \(X\) a complete sufficient statistic? (3 pts)
Is \(|X|\) a complete sufficient statistic? (3 pts)
- Does \(f(x|\theta)\) belong to the exponential family? (2 pts)
Proof. (1) It is sufficient becuse \(\frac{f(x|\theta)}{f(T(x)|\theta)}=\frac{f(x|\theta)}{f(x|\theta)}=1\) which is a constant w.r.t. \(\theta\).
For completeness, consider any function \(g(\cdot)\) such that \(Eg(X)=\sum_{x\in\{-1,0,1\}}(\frac{\theta}{2})^{|x|}(1-\theta)^{1-|x|}g(x)=0\), we have \[\begin{equation} \begin{split} &(\frac{\theta}{2})^{|-1|}(1-\theta)^{1-|-1|}g(-1)+(\frac{\theta}{2})^{|0|}(1-\theta)^{1-|0|}g(0)+(\frac{\theta}{2})^{|1|}(1-\theta)^{1-|1|}g(1)\\ &=\frac{\theta}{2}(g(-1)+g(1))+(1-\theta)g(0)=0 \end{split} \tag{10.11} \end{equation}\] we can define, for example \[\begin{equation} g(x)=\left\{\begin{aligned} x & \quad x=-1,0,1 \\ 0 & \quad o.w. \end{aligned} \right. \tag{10.12} \end{equation}\] Then \(Eg(x)=0\) but \(p(g(x)=0)\neq 1\). It is not a complete statistics.
- In terms of indicator function, we can write the p.m.f. of \(X\) as \[\begin{equation} f(x|\theta)=\frac{\theta}{2}I_{\{-1,1\}}(x)+(1-\theta)I_{\{0\}}(x) \tag{10.13} \end{equation}\] and thus the p.m.f. of \(|X|\) is \[\begin{equation} f(|x||\theta)=\frac{\theta}{2}I_{\{1\}}(|x|)+(1-\theta)I_{\{0\}}(|x|)=\frac{\theta}{2}I_{\{-1,1\}}(x)+(1-\theta)I_{\{0\}}(x) \tag{10.14} \end{equation}\] Hence \(f(x|\theta)=f(|x||\theta)\cdot 1\), by factorization theorem we have \(|X|\) is sufficient.
For completeness, consider any function \(g(\cdot)\) such that \[\begin{equation} E(g(x))=\frac{\theta}{2}(g(1)+g(1))+(1-\theta)g(0)=\theta g(1)+(1-\theta)g(0)=0 \tag{10.15} \end{equation}\]
To have (10.15) holds for any \(\theta\), it is a constant w.r.t. \(\theta\), so take derivatives w.r.t. \(\theta\) should be 0, therefore we obtain \(g(1)-g(0)=0\) or \(g(1)=g(0)\). Substitute in (10.15) we have \(g(1)=g(0)=0\). Thus, \(g(|x|)=0,\forall x\) and \(|X|\) is complete.
- Rewrite the p.d.f. of \(X\) we have \[\begin{equation} f(x|\theta)=I_{\{-1,0,1\}}(x)|x|(1-|x|)\exp(\log(\frac{\theta}{2}(1-\theta))),x\in\mathbb{R},\theta\in[0,1] \tag{10.16} \end{equation}\] The definition of exponential family is \(f(x|\boldsymbol{\theta})=h(x)c(\boldsymbol{\theta})exp(\sum_{j=1}^k\omega_j(\boldsymbol{\theta})t_j(x))\). If we define \(h(x)=I_{\{-1,0,1\}}(x)|x|(1-|x|)\), \(c(\boldsymbol{\theta})=t_1(x)=1\) and \(\omega_1(\boldsymbol{\theta})=\log(\frac{\theta}{2}(1-\theta))\), we will see that this distribution belongs to the exponential family.
References
Casella, George, and Roger Berger. 2002. Statistical Inference. 2nd ed. Belmont, CA: Duxbury Resource Center.