Chapter 14 Homework 3: Point Estimation: Problems and Solutions
Exercise 14.2 (Casella and Berger 7.6) Let X1,⋯,Xn be a random sample from the p.d.f. f(x|θ)=θx−2,0<θ≤x<∞
What is sufficient statistic for θ?
Find the MLE of θ.
- Find the method of moments estimator of θ.
Proof. (a) The joint distribution of X1,⋯,Xn is f(X|θ)=θn(n∏i=1xi)−2I[θ,∞)(min Hence, by factorization theorem, \min_{i}(X_i) is a sufficient statistic.
For the MLE of \theta, we first get the log-likelihood, which is \begin{equation} \ell(\theta)=n\log(\theta)-2\sum_{i=1}^n\log(x_i),\quad \theta\leq\min_i(X_i) \tag{14.4} \end{equation}
It is obvious that (14.4) is a monotonically increasing function of \theta, so the MLE of \theta is obtained at the upper bound, i.e. \hat{\theta}=\min_i(X_i).- For the method of moments estimator, compute the first moment we have \begin{equation} E_{\theta}(x)=\int_{\theta}^{\infty}x\theta x^{-2}dx=\infty \tag{14.5} \end{equation} Thus, the method of moments estimator does not exist.
Proof. Under method of moments, compute the expectation we have \begin{equation} E_{\theta}(X)=\int_0^{\theta}x\frac{1}{\theta}dx=\frac{\theta}{2} \tag{14.10} \end{equation} Equating with sample mean we have the method of moments estimator of \theta as \hat{\theta}_{MM}=2\bar{X}.
As for MLE, the likelihood function can be written as \begin{equation} f(\mathbf{x}|\theta)=\theta^{-n}I_{x<\theta}(\max_i(x_i))I_{x\geq 0}(\min_{i}(x_i)) \tag{14.11} \end{equation} Thus, the sufficient statistics is \max_i(x_i) and the log-likelihood function for \theta is \begin{equation} \ell(\theta)=-n\log(\theta), \theta>\max_i(x_i) \tag{14.12} \end{equation} This is a monotonically decreasing function of \theta and hence the maximum is obtained at the lower bound, hence \hat{\theta}_{MLE}=\max_i(x_i).
To compare the two estimator, we first compute the expectation,
\begin{equation} \begin{split} &E(\hat{\theta}_{MM})=E(2\bar{X})=\theta\\ &E(\hat{\theta}_{MLE})=E(\max_i(x_i))\\ &=\int_0^{\theta}x\frac{nx^{n-1}}{\theta^n}dx=\frac{n}{n+1}\theta \end{split} \tag{14.13} \end{equation}
As for variance, we have \begin{equation} \begin{split} &Var(\hat{\theta}_{MM})=Var(2\bar{X})=\frac{4\theta^2}{12n}=\frac{\theta^2}{3n}\\ &Var(\hat{\theta}_{MLE})=Var(\max_i(x_i))\\ &=\int_0^{\theta}x^2\frac{nx^{n-1}}{\theta^n}-(E(\max_i(x_i)))^2\\ &=\frac{n\theta^2}{(n+2)(n+1)^2} \end{split} \tag{14.14} \end{equation}Thus, when n is small, we prefer \hat{\theta}_{MM} because it is unbiased. While as n gets larger, the bias of \hat{\theta}_{MLE} decrease and its variance is much smaller, in that case we would prefer \hat{\theta}_{MLE}.
Exercise 14.5 (Casella and Berger 7.10) The independent random variables X_1,\cdots,X_n have common distribution \begin{equation} P(X_i\leq x|\alpha,\beta)=\left\{\begin{aligned} 0 &\quad x<0\\ (x/\beta)^{\alpha} &\quad 0\leq x\leq\beta \\ 1 &\quad x>\beta \end{aligned} \right. \tag{14.15} \end{equation} (a) Find a two-dimensional sufficient statistic for (\alpha,\beta).
Find the MLEs of \alpha and \beta.
Suppose the following data can be modeled with this distribution.
22.0, 23.9, 20.9, 23.8, 25.0, 24.0, 21.7, 23.8, 22.8, 23.1, 23.1, 23.5, 23.0, 23.0.
Find the MLEs of \alpha and \beta. Proof. (a) The p.d.f. of a singel X_i is
\begin{equation}
f(x|\alpha,\beta)=\left\{\begin{aligned} &\frac{\alpha x^{\alpha-1}}{\beta^\alpha} &\quad 0\leq x\leq\beta\\
&0 & \quad o.w. \end{aligned}\right.
\tag{14.16}
\end{equation}
Therefore, the joint distribution is
\begin{equation}
f(\mathbf{x}|\alpha,\beta)=\frac{\alpha^n(\prod_{i=1}^nx_i)^{\alpha-1}}{\beta^{n\alpha}}I_{x\leq\beta}(\max_i(x_i))I_{x\geq 0}(\min_i(x_i))
\tag{14.17}
\end{equation}
Thus, by factorization theorem, the two dimensional sufficient statistic of (\alpha,\beta) can be (\prod_{i=1}^nX_i,\max_i(X_i)).
Based on the sufficient principal, the log-likelihood function of (\alpha,\beta) is \begin{equation} \ell(\alpha,\beta)=n\log(\alpha)+(\alpha-1)(\sum_{i=1}^n\log(x_i))-n\alpha\log\beta, \quad \max_i(x_i)\leq\beta \tag{14.18} \end{equation} Consider the likelihood function taking derivatives w.r.t. \alpha we have \begin{equation} \frac{\partial\ell}{\partial\alpha}=\frac{n}{\alpha}+(\sum_{i=1}^n\log(x_i))-n\log(\beta) \tag{14.19} \end{equation} This is a monotonically decreasing function of \alpha, thus set it to 0 we get the maximum on \alpha direction. On the other hand, (14.18) is a monotonically decreasing function for \beta, which means the maximum on the \beta direction is obtained at the minimum of \beta. Hence, we have the MLE as \begin{equation} \left\{\begin{aligned} &\hat{\alpha}=\frac{1}{\log(\hat{\beta})-1/n\sum_{i=1}^n\log(X_i)}\\ & \hat{\beta}=\max_i(X_i) \end{aligned} \right. \tag{14.20} \end{equation}
- Using (14.20), we have \hat{\alpha}=12.59 and \hat{\beta}=25.0.
x=c(22.0, 23.9, 20.9, 23.8, 25.0, 24.0, 21.7, 23.8, 22.8, 23.1, 23.1, 23.5, 23.0, 23.0)
hat.alpha=1/(log(max(x))-sum(log(x))/length(x))
hat.beta=max(x)
hat.beta
## [1] 25
Exercise 14.6 (Casella and Berger 7.11) Let X_1,\cdots,X_n be i.i.d. with p.d.f. \begin{equation} f(x|\theta)=\theta x^{\theta-1},\quad 0\leq x\leq 1,0<\theta<\infty \tag{14.21} \end{equation}
Find the MLE of \theta, and show that its variance \to 0 as n\to\infty.
- Find the method of moments estimator of \theta.
Proof. (a) The joint likelihood function of \theta is \begin{equation} f(\mathbf{x}|\theta)=\theta^n(\prod_{i=1}^nx_i)^{\theta-1} \tag{14.22} \end{equation} Thus, the log-likelihood function of \theta is \begin{equation} \ell(\theta)=n\log(\theta)+(\theta-1)\sum_{i=1}^n\log(x_i) \tag{14.23} \end{equation} Taking derivatives w.r.t. \theta we have \begin{equation} \ell^{\prime}=\frac{n}{\theta}+\sum_{i=1}^n\log(x_i) \tag{14.24} \end{equation} From (14.24) it is obvious that the second derivatives will be negative. Thus set the first derivative equals to 0 we have \hat{\theta}=-\frac{n}{\sum_{i=1}^n\log(X_i)}.
To get the variance, notice that by variable transformation, Y_i=-\log(X_i) have an exponential distribution with parameter \theta. Thus, -\sum_{i=1}^n\log(X_i)\sim Gamma(n,\theta). Hence, denote T=-\sum_{i=1}^n\log(X_i) we have \hat{\theta}=\frac{n}{T} and
\begin{equation}
\begin{split}
&E(\frac{1}{T})=\frac{\theta^n}{\Gamma(n)}\int_{0}^{\infty}\frac{1}{t}t^{n-1}e^{-\theta t}dt=\frac{\theta}{n-1}\\
&E(\frac{1}{T^2})=\frac{\theta^n}{\Gamma(n)}\int_{0}^{\infty}\frac{1}{t^2}t^{n-1}e^{-\theta t}dt=\frac{\theta^2}{(n-1)(n-2)}\\
\end{split}
\tag{14.25}
\end{equation}
and thus
\begin{equation}
\begin{split}
&E(\hat{\theta})=\frac{n\theta}{n-1}\\
&Var(\hat{\theta})=\frac{n^2\theta^2}{(n-1)^2(n-2)}\to\infty
\end{split}
\tag{14.26}
\end{equation}
- The expectation of X is
\begin{equation}
E(x)=\int_0^1x\theta x^{\theta-1}dx=\frac{\theta}{\theta+1}
\tag{14.27}
\end{equation}
Equating to \bar{X} we have \theta=\frac{\bar{X}}{1-\bar{X}}.
Exercise 14.7 (Casella and Berger 7.15) Let X_1,\cdots,X_n be a sample from the inverse Gaussian p.d.f. \begin{equation} f(x|\mu,\lambda)=(\frac{\lambda}{2\pi x^3})^{1/2}exp\{-\lambda(x-\mu)^2/(2\mu^2x)\},\quad x>0 \tag{14.28} \end{equation}
Show that the MLEs of \mu and \lambda are \begin{equation} \left\{\begin{aligned} & \hat{\mu}_n=\bar{X} \\ & \hat{\lambda}_n=\frac{n}{\sum_{i=1}^n\frac{1}{X_i}-\frac{1}{\bar{X}}} \end{aligned}\right. \tag{14.29} \end{equation}
It has been showed that \hat{\mu}_n and \hat{\lambda}_n are independent and \hat{\mu}_n having an inverse Gaussian distribution with parameter \mu and n\lambda and n\lambda/\hat{\lambda}_n having a \chi_{n-1}^2 distribution. This can be shown by induction.
- Show that \hat{\mu}_2 has an inverse Gaussian distribution with parameters \mu and 2\lambda, 2\lambda/\hat{\lambda}_2 has a \chi_1^2 distribution and they are independent.
- Assume the result is true for n=k and that we get a new, independent observation x. Estabilish the induction step by transform the p.d.f. f(x,\hat{\mu}_k,\hat{\lambda}_k) to f(x,\hat{\mu}_{k+1},\hat{\lambda}_{k+1}). Show that this density factors in the appropriate way.
Proof. (a) The joint distribution is given by \begin{equation} f(\mathbf{x}|\mu,\lambda)=(\frac{\lambda}{2\pi})^{n/2}(\prod_{i=1}^nx_i)^{-3}exp\{-\frac{\lambda}{2}\sum_{i=1}^n\frac{(x_i-\mu)^2}{\mu^2x_i}\} \tag{14.30} \end{equation} For fixed \lambda, maximizing w.r.t. \mu is equivalent to minimize the sum inside the exponential term. Thus \begin{equation} \frac{d}{d\mu}\sum_{i=1}^n\frac{(x_i-\mu)^2}{\mu^2x_i}=-\sum_{i=1}^n\frac{2(\frac{x_i}{\mu}-1)x_i}{\mu^2x_i} \tag{14.31} \end{equation} Setting this to 0 we have \sum_{i=1}^n(\frac{x_i}{\mu}-1)=0 which implies \hat{\mu}=\bar{X}. Plugging this back to (14.30) and maximizing w.r.t. \lambda, with the form \lambda^{n/2}e^{-\lambda b} where b=\sum_{i=1}^n\frac{(x_i-\bar{x})^2}{2\bar{x}^2x_i} and maximize gives \hat{\lambda}=\frac{n}{2b} where \begin{equation} 2b=\sum_{i=1}^n\frac{x_i}{\bar{x}^2}-2\sum_{i=1}^n\frac{1}{\bar{x}}+\sum_{i=1}^n\frac{1}{x_i}=\sum_{i=1}^n(\frac{1}{x_i}-\frac{1}{\bar{x}}) \tag{14.32} \end{equation}
- See the paper
Exercise 14.8 (Casella and Berger 7.19) Suppose that the random variables Y_1,\cdots,Y_n satisfy \begin{equation} Y_i=\beta x_i+\epsilon_i,\quad i=1,\cdots,n \tag{14.33} \end{equation} where x_1,\cdots,x_n are fixed constants, and \epsilon_1,\cdots,\epsilon_n are i.i.d. N(0,\sigma^2) where \sigma^2 unknown.
Find a two-dimensional sufficient statistic for (\beta,\sigma^2).
Find the MLE of \beta, show that it is an unbiased estimator of \beta.
- Find the distribution of the MLE of \beta.
Proof. (a) The likelihood function can be written as \begin{equation} \begin{split} L(\beta,\sigma^2|\mathbf{x},\mathbf{Y})&=(2\pi\sigma^2)^{-n/2}exp\{-\frac{1}{2\sigma^2}\sum_{i=1}^n(y_i-\beta x_i)^2\}\\ &=(2\pi\sigma^2)^{-n/2}exp(-\frac{\beta^2\sum_{i=1}^nx_i^2}{2\sigma^2})exp(-\frac{-\sum_{i=1}^ny_i}{2\sigma^2}+\frac{\beta}{\sigma^2}\sum_{i=1}^nx_iy_i) \end{split} \tag{14.34} \end{equation} Thus, the sufficient statistic is (\sum_{i=1}^nY_i^2,\sum_{i=1}^nx_iY_i), (because \sum_{i=1}^nx_i is assumed fixed constant).
The log-likelihood function is \begin{equation} \ell(\beta,\sigma^2)=-\frac{n}{2}\log(2\pi)-\frac{n}{2}\log(\sigma^2)-\frac{1}{2\sigma^2}\sum_{i=1}^ny_i^2+\frac{\beta}{\sigma^2}\sum_{i=1}^nx_iy_i-\frac{\beta^2}{2\sigma^2}\sum_{i=1}^nx_i^2 \tag{14.35} \end{equation} For a fixed value of \sigma^2, \begin{equation} \frac{\partial\ell}{\partial\beta}=\frac{1}{\sigma^2}\sum_{i=1}^nx_iy_i-\frac{\beta}{\sigma^2}\sum_{i=1}^nx_i^2 \tag{14.36} \end{equation} Set to 0 we have \hat{\beta}=\frac{\sum_{i=1}^nx_iY_i}{\sum_{i=1}^nx_i^2} also the second derivative w.r.t. \beta is \begin{equation} \frac{\partial^2\ell}{\partial\beta^2}=-\frac{1}{\sigma^2}\sum_{i=1}^nx_i^2<0 \tag{14.37} \end{equation} Thus, \ell obtain its maximum at \hat{\beta}, it dose not depend on \sigma^2 so it is MLE. And \begin{equation} E(\hat{\beta})=\frac{\sum_{i=1}^nx_iE(y_i)}{\sum_{i=1}^nx_i^2}=\frac{\sum_{i=1}^nx_i\cdot\beta x_i}{\sum_{i=1}^nx_i^2}=\beta \tag{14.38} \end{equation}
- Since \hat{\beta} is just the weighted sum of normally distributed random variable, i.e. \begin{equation} \hat{\beta}=\frac{\sum_{i=1}^nx_iY_i}{\sum_{i=1}^nx_i^2}=\sum_{i=1}^n(\frac{x_i}{\sum_{i=1}^nx_i^2})Y_i \tag{14.39} \end{equation} we have \hat{\beta} is normally distributed with mean \beta and variance \begin{equation} Var(\hat{\beta})=\sum_{i=1}^n(\frac{x_i}{\sum_{i=1}^nx_i^2})^2\sigma^2=\frac{\sigma^2}{\sum_{i=1}^nx_i^2} \tag{14.40} \end{equation}
Proof. The joint p.d.f. of X_1,\cdots,X_n is \begin{equation} f(\mathbf{x}|\theta)=(2\theta)^{-n}I_{x<\theta}(\max_{i}|x_i|) \tag{14.42} \end{equation} By factorization theorem, \max_i|X_i| is a sufficient statistic.
Denote \max_i|x_i|=Y, then firstly the c.d.f. of Y is
\begin{equation}
\begin{split}
F_Y(y)&=Pr(Y\leq y)=pr(\max_i|X_i|\leq y)\\
&=(Pr(|X_i|\leq y))^n=(\frac{y}{\theta})^n
\end{split}
\tag{14.43}
\end{equation}
Thus, the p.d.f. of Y is
\begin{equation}
f_Y(y)=\frac{dF_Y(y)}{dy}=\frac{ny^{n-1}}{\theta^n}
\tag{14.44}
\end{equation}
Hence, by definition of complete statistic, if for some function g(\cdot) we have E(g(Y))=0, then \begin{equation} E(g(Y))=\int_{0}^{\theta}g(y)\frac{ny^{n-1}}{\theta^n}dy=\frac{n}{\theta^n}\int_{0}^{\theta}g(y)y^{n-1}dy=0 \tag{14.45} \end{equation} (14.45), as a function of \theta is a constant 0, thus, the integral part should also be 0 despite the value of \theta, which indicates that taking derivatives w.r.t. \theta is 0. Thus, \begin{equation} g(\theta)\theta^{n-1}=0,\quad\forall\theta \tag{14.46} \end{equation} Therefore, g(Y)=0 almost surely and Y is a complete statistic.
Finally, the expected value of Y is \begin{equation} E(Y)=\int_{0}^{\theta}y\frac{ny^{n-1}}{\theta^n}dy=\frac{n}{n+1}\theta \tag{14.47} \end{equation} Thus, \frac{n+1}{n}\max_i|X_i| is an unbiased estimator of \theta. Also because it is a function of complete sufficient statistic, it is best unbiased estimator.Proof. Firstly, we have \begin{equation} \frac{d}{dp}E_{p}(\bar{X})=\frac{d}{dp}p=1 \tag{14.48} \end{equation} and the likelihood as \begin{equation} f(\mathbf{x}|\theta)=p^{\sum_{i=1}^nx_i}(1-p)^{n-\sum_{i=1}^nx_i} \tag{14.49} \end{equation} and hence the log-likelihood as \begin{equation} \ell(p)=(\sum_{i=1}^nx_i)\log(p)-(n-\sum_{i=1}^nx_i)\log(1-p) \tag{14.50} \end{equation} Therefore \begin{equation} \ell^{\prime}(p)=\frac{n(\bar{x}-p)}{p(1-p)} \tag{14.51} \end{equation} and finally \begin{equation} E_{p}((\ell^{\prime}(p))^2)=\frac{n^2}{(p(1-p))^2}E_{p}((\bar{x}-p)^2)=\frac{n}{p(1-p)} \tag{14.52} \end{equation} Thus, from (14.48) and (14.52) we have the Cramer-Rao lower bound as \frac{p(1-p)}{n}, which is exactly the variance of \bar{X}. Hence, \bar{X} is the UMVUE of p.
Exercise 14.11 (Casella and Berger 7.48) Suppose that X_i, i=1,\cdots,n are i.i.d. Bernoulli(p).
Show that the variance of the MLE of p attains the Cramer-Rao Lower Bound.
- For n\geq4, show that the product X_1X_2X_3X_4 is an unbiased estimator of p^4, and use this fact to find the best unbiased estimator of p^4.
Proof. (a) The log-likelihood is \begin{equation} \ell(p)=(\sum_{i=1}^nx_i)\log(p)-(n-\sum_{i=1}^nx_i)\log(1-p) \tag{14.53} \end{equation} with derivatives \begin{equation} \ell^{\prime}(p)=\frac{n(\bar{x}-p)}{p(1-p)} \tag{14.54} \end{equation} Setting to 0 we have \hat{p}=\bar{X}. And the first derivatives can also be written as \begin{equation} \ell^{\prime}(p)=\frac{\sum_{i=1}^nx_i}{p}-\frac{n-\sum_{i=1}^nx_i}{1-p} \tag{14.55} \end{equation} Since as a function of p, the plus part of (14.55) is decreasing as p increase, and the minus part increase as p increase. Hence, it is a monotonically decreasing function as p, suggesting that \hat{p}=\bar{X} is the MLE of p. From Exercise (exr:exr14010) we have already obtained that \bar{X} attaines the Cramer-Rao lower bound.
- Using identically distributed and independence, we have \begin{equation} E_{p}(\prod_{i=1}^4X_i)=(E_p(X_1))^4=p^4 \tag{14.56} \end{equation} and hence it is an unbiased estimator.
Exercise 14.12 (Casella and Berger 7.49) Let X_1,\cdots,X_n be i.i.d. Exp(\lambda)
Find an unbiased estimator of \lambda based only on Y=\min\{X_1,\cdots,X_n\}.
Find a better estimator than the one in part (a). Prove that it is better.
For the following data, estimate the \lambda using the estimators from parts (a) and (b).
Proof. (a) The c.d.f. of Y is \begin{equation} F_Y(y)=1-(1-F_X(y))^n \tag{14.59} \end{equation} thus, the p.d.f. of Y is \begin{equation} f_Y(y)=n(1-F_X(y))^{n-1}f_x(y)=n\lambda e^{-n\lambda y} \tag{14.60} \end{equation} It suggests that Y\sim Exp(n\lambda), and hence E_{\lambda}(Y)=\frac{1}{n\lambda} which means that nY is an unbiased estimator of \lambda.
- Since the distribution of X belongs to the exponential family, \sum_{i=1}^nX_i is a complete sufficient statistic. Therefore, E(nY|\sum_{i=1}^nX_i) is the best unbiased estimator. By completeness, any function g(\cdot) of \sum_{i=1}^nX_i equals to 0 almost surely if E(g(\sum_{i=1}^nX_i))=0. Consider g(\sum_{i=1}^nX_i)=\sum_{i=1}^nX_i-nE(nY|\sum_{i=1}^nX_i), we have \begin{equation} \begin{split} E(g(\sum_{i=1}^nX_i))&=E(\sum_{i=1}^nX_i)-E(nE(nY|\sum_{i=1}^nX_i))\\ &=n\lambda-nE(nY)=0 \end{split} \tag{14.61} \end{equation} Hence, \sum_{i=1}^nX_i-nE(nY|\sum_{i=1}^nX_i)=0 almost surely which means E(nY|\sum_{i=1}^nX_i)=\frac{\sum_{i=1}^nX_i}{n}=\bar{X} is the best unbiased estimator of \lambda.
Actually, we have \begin{equation} Var(nY)=\frac{1}{\lambda^2}\geq\frac{1}{n\lambda^2}=Var(\bar{X}) \tag{14.62} \end{equation} which means \bar{X} does better than nY.
- From (a) and (b) we have two unbiased estimator as \hat{\lambda}_1=nY=601.2 and \hat{\lambda}_2=\bar{X}=124.825. The large difference between them suggests the exponential model is not a good fit of data.
x=c(50.1, 70.1, 137.0, 166.9, 170.5, 152.8, 80.5, 123.5, 112.6, 148.5, 160.0, 125.4)
hat.lambda1=length(x)*min(x)
hat.lambda2=mean(x)
print(c(hat.lambda1,hat.lambda2))
## [1] 601.200 124.825
Exercise 14.13 (Casella and Berger 7.50) Let X_1,\cdots,X_n be i.i.d. N(\theta,\theta^2), \theta>0. For this model both \bar{X} and cS are unbiased estimators of \theta, where c=\frac{\sqrt{n-1}\Gamma((n-1)/2)}{\sqrt{2}\Gamma(n/2)}.
Prove that for any number a the estimator a\bar{X}+(1-a)(cS) is an unbiased estimator of \theta.
Find the value of a that produces the estimator with minimum variance.
- Show that (\bar{X},S^2) is a sufficient statistic for \theta but it is not a complete sufficient statistic.
Proof. (a) Since we have E(\bar{X})=E(cS)=\theta, \begin{equation} E(a\bar{X}+(1-a)(cS))=aE(\bar{X})+(1-a)E(cS)=\theta \tag{14.63} \end{equation} which suggests that a\bar{X}+(1-a)(cS) is an unbiased estimator of \theta.
Since the variance of a\bar{X}+(1-a)(cS) is \begin{equation} Var(a\bar{X}+(1-a)(cS))=a^2\frac{\theta^2}{n}+(1-a)^2c^2[E(S^2)-(E(S))^2]=\theta^2[(c^2-1+\frac{1}{n})a^2-2(c^2-1)a+(c^2-1)] \tag{14.64} \end{equation} As a function of a, (14.64) attains its minimum at a=\frac{c^2-1}{c^2-1+(1/n)}.
- The likelihood function of \theta is \begin{equation} f(\mathbf{x}|\theta)=(2\pi\theta^2)^{-n/2}exp(-\frac{(x_i-\bar{x})^2}{2\sigma^2})exp(-\frac{S^2}{2\sigma^2}) \tag{14.65} \end{equation} Thus, (\bar{X},S^2) is sufficient. But since g(\bar{X},S^2)=\bar{X}-c\sqrt{S^2} have expectation 0 and not equals to 0 almost surely, it is not complete.
Exercise 14.14 (Casella and Berger 7.52) Let X_1,\cdots,X_n be i.i.d. Pois(\lambda), and let \bar{X} abd S^2 denote the sample mean and variance, respectively.
Prove that \bar{X} us the best unbiased estimator of \lambda without using the Cramer-Rao Theorem.
Prove the rather remarkable identity E(S^2|\bar{X})=\bar{X}, and use it to explicity demonstrate that Var(S^2)>Var(\bar{X}).
- Using completeness, can a general theorem be formulated for which the identity in part (b) is a special case?
Proof. (a) Since Poisson distribution belongs to exponential family, we have \bar{X} as a complete sufficient statistic. Furthermore, E(\bar{X})=\lambda. Hence, \bar{X} is the best unbiased estimator of \theta.
- Since \bar{X} is a complete sufficnet statistic, E(S^2|\bar{X}) is a function of \bar{X}. Furthermore, by completeness, any function g(\cdot) of \bar{X} satisfies E(g(\bar{X}))=0 equals to 0 almost surely. Hence, consider g(\bar{X})=\bar{X}-E(S^2|\bar{X}), we have \begin{equation} E(\bar{X}-E(S^2|\bar{X}))=E(\bar{X})-E(S^2)=\lambda-\lambda=0 \tag{14.66} \end{equation} Thus, E(S^2|\bar{X})=\bar{X} almost surely.
By law of total variance, we imediately have
\begin{equation}
Var(S^2)=Var(E(S^2|\bar{X}))+E(Var(S^2|\bar{X}))=Var(\bar{X})+E(Var(S^2|\bar{X}))>Var(\bar{X})
\tag{14.67}
\end{equation}
By uniqueness of best unbiased estimator, we have the strict less than here.
- Let T(\mathbf{X}) be a complete sufficient statistic and T^*(\mathbf{X}) be any other statistic such that E(T(X))=E(T^*(X)), then E(T^*(X)|T(X))=T(X) and Var(T^*(X))>Var(T(X)).
Exercise 14.15 (Casella and Berger 7.55) For each of the following p.d.f., let X_1,\cdots,X_n be a sample from that distribution. In each case, find the best unbiased estimator of \theta^r.
f(x|\theta)=\frac{1}{\theta}, 0<x<\theta,r<n.
f(x|\theta)=e^{-(x-\theta)}, x>\theta.
- f(x|\theta)=\frac{e^{-x}}{e^{-\theta}-e^{-b}}, \theta<x<b, b known.
Proof. Consider the general case. Suppose X\sim f(x|\theta)=c(\theta)m(x), a<x<\theta. Then \frac{1}{c(\theta)}=\int_{a}^{\theta}m(x)dx, taking derivatives w.r.t. \theta on both sides gives -\frac{c^{\prime}(\theta)}{c(\theta)}=m(\theta)c(\theta).The c.d.f. of X is F_X(x)=\frac{c(\theta)}{c(x)}, a<x<\theta. Let Y=\max_i(X_i), then using same technique as in Example 6.6 we have Y is a complete sufficient statistic. Thus, any function T(Y) that is an unbiased estimator of h(\theta) is the best unbiased estimator of h(\theta). Since the p.d.f. of Y is \begin{equation} f_Y(y|\theta)=\frac{nm(y)c(\theta)^n}{c(y)^{n-1}},\quad a<\theta<y \tag{14.68} \end{equation} Consider the equations \int_a^{\theta}f(x|\theta)dx=1 and \int_a^{\theta}T(y)g(y|\theta)dy=h(\theta), which is equivalent to \begin{equation} \int_a^{\theta}m(x)dx=\frac{1}{c(\theta)} \tag{14.69} \end{equation} and \begin{equation} \int_a^{\theta}\frac{T(y)nm(y)}{c(y)^{n-1}}dy=\frac{h(\theta)}{c(\theta)^n} \tag{14.70} \end{equation} Differentiating (14.69) and (14.70) on both sides w.r.t. \theta we have \begin{equation} \left\{\begin{aligned} & m(\theta)=-\frac{c^{\prime}(\theta)}{c(\theta)^2} \\ & m(\theta)=\frac{c(\theta)h^{\prime}(\theta)-h(\theta)nc^{\prime}(\theta)}{c(\theta)^2T(\theta)n}\end{aligned}\right. \tag{14.71} \end{equation}
Solve the equations for T(y) by change \theta to Y in (14.71) we have \begin{equation} T(y)=h(y)+\frac{h^{\prime}(y)}{nm(y)c(y)} \tag{14.72} \end{equation} which is the expression of best unbiased estimator of h(\theta).
Note also the condition for (14.72) holds can be extend to any p.d.f. with upper bound of support as \theta. The resaon is (14.71) holds for all of those cases. Here we only write the form of support as a<x<\theta to simplify the notation. Also if the lower bound of support is \theta, \min_i(X_i) is the complete sufficient statistic in those cases, by same technique we will have
\begin{equation} T(y)=h(y)-\frac{h^{\prime}(y)}{nm(y)c(y)} \tag{14.73} \end{equation} as the best unbiased estimator.
In this problem, for h(\theta)=\theta^r and h^{\prime}(\theta)=r\theta^{r-1},
For this p.d.f. m(x)=1, c(\theta)=\frac{1}{\theta}, hence \begin{equation} T(Y)=y^r+\frac{ry^{r-1}}{n(1/y)}=\frac{n+r}{n}y^r \tag{14.74} \end{equation} is the best unbiased estimator, where Y=\max_i(X_i).
For this p.d.f. m(x)=e^{-x} and c(\theta)=e^{\theta}, \theta is the lower bound so Y=\min_i(X_i) and \begin{equation} T(Y)=y^r-\frac{ry^{r-1}}{ne^{-y}e^{y}}=y^r-\frac{ry^{r-1}}{n} \tag{14.75} \end{equation} is the best unbiased estimator, where Y=\min_i(X_i).
- For this p.d.f. m(x)=e^{-x} and c(\theta)=1/(e^{-\theta}-e^{-b}), \theta is the lower bound so Y=\min_i(X_i) and \begin{equation} T(Y)=y^r-\frac{ry^{r-1}}{ne^{-y}}(e^{-y}-e^{-b}) \tag{14.76} \end{equation} is the best unbiased estimator, where Y=\min_i(X_i).
Exercise 14.16 (Casella and Berger 7.58) Let X be an observation from the p.d.f. \begin{equation} f(x|\theta)=(\frac{\theta}{2})^{|x|}(1-\theta)^{1-|x|},\quad x=-1,0,1;\, 0\leq\theta\leq 1. \tag{14.77} \end{equation}
Find the MLE of \theta.
Define the estimator T(X) by \begin{equation} T(X)=\left\{\begin{aligned} 2 & \quad x=1\\ 0 &\quad o.w. \end{aligned}\right. \tag{14.78} \end{equation} Show that T(X) is an unbiased estimator of \theta.
- Find a better estimator than T(X) and prove that it is better.
Proof. (a) Since the log-likelihood function is \begin{equation} \ell(\theta)=\log(f(\mathbf{x}|\theta))=|x|\log(\theta)+(1-|x|)\log(1-\theta)-|x|\log(2) \tag{14.79} \end{equation} Taking derivatives w.r.t. \theta we have \begin{equation} \ell^{\prime}(\theta)=\frac{|x|}{\theta}-\frac{1-|x|}{1-\theta} \tag{14.80} \end{equation} Setting to 0 we have \hat{\theta}=1. Since (14.80) is a monotonically decreasing function of \theta, we have \hat{\theta} is the global maximum and therefore the MLE of \theta is 1.
Just compute the expectation of T(X), we have \begin{equation} E(T(X))=2\cdot(\frac{\theta}{2})^{|1|}(1-\theta)^{1-|1|}=\theta \tag{14.81} \end{equation} Thus, T(X) is unbiased estimator of \theta.
- In Exercise 10.4 we have proved that |X| is a complete sufficient statisitc of \theta. Hence E(T(X)||X|) is the best unbiased estimator. Since E(|X|)=\theta, by the general result in Exercise @(exr:exr14014), E(T(X)||X|)=|X| is the best unbiased estimator. Actually,
\begin{equation}
\begin{split}
&Var(T(X))=E((T(X))^2)-(E(T(X)))^2=2\theta-\theta^2\\
&>\theta-\theta^2=Var(|X|)
\end{split}
\tag{14.82}
\end{equation}