Chapter 14 Homework 3: Point Estimation: Problems and Solutions

Exercise 14.1 (Casella and Berger 7.1) One observation is taken on a discrete random variable \(X\) with p.m.f. \(f(x|\theta)\), where \(\theta=\{1,2,3\}\). Find the MLE of \(\theta\). The p.m.f. is shown below. \[\begin{equation} \begin{aligned} & x & \quad f(x|1) & & \quad f(x|2) & & \quad f(x|3)\\ & 0 & \quad 1/3 & & \quad 1/4 & & \quad 0\\ & 1 & \quad 1/3 & & \quad 1/4 & & \quad 0\\ & 2 & \quad 0 & & \quad 1/4 & & \quad 1/4\\ & 3 & \quad 1/6 & & \quad 1/4 & & \quad 1/2\\ & 4 & \quad 1/6 & & \quad 0 & & \quad 1/4 \end{aligned} \end{equation}\]
Proof. The MLE of \(\theta\) is \[\begin{equation} \hat{\theta}=\left\{\begin{aligned} 1 &\quad X=0\\ 1 &\quad X=1\\ 2\,or\,3 & \quad X=2\\ 3 &\quad X=3\\4 &\quad X=4 \end{aligned} \right. \tag{14.1} \end{equation}\]

Exercise 14.2 (Casella and Berger 7.6) Let \(X_1,\cdots,X_n\) be a random sample from the p.d.f. \[\begin{equation} f(x|\theta)=\theta x^{-2},\quad 0<\theta\leq x<\infty \tag{14.2} \end{equation}\]

  1. What is sufficient statistic for \(\theta\)?

  2. Find the MLE of \(\theta\).

  3. Find the method of moments estimator of \(\theta\).

Proof. (a) The joint distribution of \(X_1,\cdots,X_n\) is \[\begin{equation} f(\mathbf{X}|\theta)=\theta^n(\prod_{i=1}^nx_i)^{-2}I_{[\theta,\infty)}(\min_{i}(X_i)) \tag{14.3} \end{equation}\] Hence, by factorization theorem, \(\min_{i}(X_i)\) is a sufficient statistic.

  1. For the MLE of \(\theta\), we first get the log-likelihood, which is \[\begin{equation} \ell(\theta)=n\log(\theta)-2\sum_{i=1}^n\log(x_i),\quad \theta\leq\min_i(X_i) \tag{14.4} \end{equation}\]
    It is obvious that (14.4) is a monotonically increasing function of \(\theta\), so the MLE of \(\theta\) is obtained at the upper bound, i.e. \(\hat{\theta}=\min_i(X_i)\).

  2. For the method of moments estimator, compute the first moment we have \[\begin{equation} E_{\theta}(x)=\int_{\theta}^{\infty}x\theta x^{-2}dx=\infty \tag{14.5} \end{equation}\] Thus, the method of moments estimator does not exist.
Exercise 14.3 (Casella and Berger 7.7) Let \(X_1,\cdots,X_n\) be i.i.d. with one of two p.d.fs. If \(\theta=0\), then \[\begin{equation} f(x|\theta)=\left\{\begin{aligned} 1 &\quad 0<x<1\\ 0 &\quad o.w. \end{aligned} \right. \tag{14.6} \end{equation}\] while if \(\theta=1\), then \[\begin{equation} f(x|\theta)=\left\{\begin{aligned} &1/(2\sqrt{x}) &\quad 0<x<1\\ &0 &\quad o.w. \end{aligned} \right. \tag{14.7} \end{equation}\] Find the MLE of \(\theta\).
Proof. The joint likelihood of \(\theta\) is \[\begin{equation} \left\{\begin{aligned} &L(0|\mathbf{x})=1 & \quad 0<x_i<1 \\ &L(1|\mathbf{x_i})=\frac{1}{2^n\sqrt{\prod_{i=1}^nx_i}} & \quad 0<x_i<1 \end{aligned} \right. \tag{14.8} \end{equation}\] Hence the MLE of \(\theta\) is 0 if \(\frac{1}{2^n\sqrt{\prod_{i=1}^nx_i}}\leq 1\) and 1 if \(\frac{1}{2^n\sqrt{\prod_{i=1}^nx_i}}>1\), for \(x_i\in(0,1),i=1,\cdots,n\).
Exercise 14.4 (Casella and Berger 7.9) Let \(X_1,\cdots,X_n\) be i.i.d. with p.d.f. \[\begin{equation} f(x|\theta)=\frac{1}{\theta},\quad 0\leq x\leq\theta,\theta>0 \tag{14.9} \end{equation}\] Estimate \(\theta\) using both method of moments and maximum likelihood. Calculate the means and variance of the two estimators. Which one should be preferred and why?

Proof. Under method of moments, compute the expectation we have \[\begin{equation} E_{\theta}(X)=\int_0^{\theta}x\frac{1}{\theta}dx=\frac{\theta}{2} \tag{14.10} \end{equation}\] Equating with sample mean we have the method of moments estimator of \(\theta\) as \(\hat{\theta}_{MM}=2\bar{X}\).

As for MLE, the likelihood function can be written as \[\begin{equation} f(\mathbf{x}|\theta)=\theta^{-n}I_{x<\theta}(\max_i(x_i))I_{x\geq 0}(\min_{i}(x_i)) \tag{14.11} \end{equation}\] Thus, the sufficient statistics is \(\max_i(x_i)\) and the log-likelihood function for \(\theta\) is \[\begin{equation} \ell(\theta)=-n\log(\theta), \theta>\max_i(x_i) \tag{14.12} \end{equation}\] This is a monotonically decreasing function of \(\theta\) and hence the maximum is obtained at the lower bound, hence \(\hat{\theta}_{MLE}=\max_i(x_i)\).

To compare the two estimator, we first compute the expectation,

\[\begin{equation} \begin{split} &E(\hat{\theta}_{MM})=E(2\bar{X})=\theta\\ &E(\hat{\theta}_{MLE})=E(\max_i(x_i))\\ &=\int_0^{\theta}x\frac{nx^{n-1}}{\theta^n}dx=\frac{n}{n+1}\theta \end{split} \tag{14.13} \end{equation}\]

As for variance, we have \[\begin{equation} \begin{split} &Var(\hat{\theta}_{MM})=Var(2\bar{X})=\frac{4\theta^2}{12n}=\frac{\theta^2}{3n}\\ &Var(\hat{\theta}_{MLE})=Var(\max_i(x_i))\\ &=\int_0^{\theta}x^2\frac{nx^{n-1}}{\theta^n}-(E(\max_i(x_i)))^2\\ &=\frac{n\theta^2}{(n+2)(n+1)^2} \end{split} \tag{14.14} \end{equation}\]
Thus, when \(n\) is small, we prefer \(\hat{\theta}_{MM}\) because it is unbiased. While as \(n\) gets larger, the bias of \(\hat{\theta}_{MLE}\) decrease and its variance is much smaller, in that case we would prefer \(\hat{\theta}_{MLE}\).

Exercise 14.5 (Casella and Berger 7.10) The independent random variables \(X_1,\cdots,X_n\) have common distribution \[\begin{equation} P(X_i\leq x|\alpha,\beta)=\left\{\begin{aligned} 0 &\quad x<0\\ (x/\beta)^{\alpha} &\quad 0\leq x\leq\beta \\ 1 &\quad x>\beta \end{aligned} \right. \tag{14.15} \end{equation}\] (a) Find a two-dimensional sufficient statistic for \((\alpha,\beta)\).

  1. Find the MLEs of \(\alpha\) and \(\beta\).

  2. Suppose the following data can be modeled with this distribution.

22.0, 23.9, 20.9, 23.8, 25.0, 24.0, 21.7, 23.8, 22.8, 23.1, 23.1, 23.5, 23.0, 23.0.

Find the MLEs of \(\alpha\) and \(\beta\).

Proof. (a) The p.d.f. of a singel \(X_i\) is \[\begin{equation} f(x|\alpha,\beta)=\left\{\begin{aligned} &\frac{\alpha x^{\alpha-1}}{\beta^\alpha} &\quad 0\leq x\leq\beta\\ &0 & \quad o.w. \end{aligned}\right. \tag{14.16} \end{equation}\] Therefore, the joint distribution is \[\begin{equation} f(\mathbf{x}|\alpha,\beta)=\frac{\alpha^n(\prod_{i=1}^nx_i)^{\alpha-1}}{\beta^{n\alpha}}I_{x\leq\beta}(\max_i(x_i))I_{x\geq 0}(\min_i(x_i)) \tag{14.17} \end{equation}\]
Thus, by factorization theorem, the two dimensional sufficient statistic of \((\alpha,\beta)\) can be \((\prod_{i=1}^nX_i,\max_i(X_i))\).

  1. Based on the sufficient principal, the log-likelihood function of \((\alpha,\beta)\) is \[\begin{equation} \ell(\alpha,\beta)=n\log(\alpha)+(\alpha-1)(\sum_{i=1}^n\log(x_i))-n\alpha\log\beta, \quad \max_i(x_i)\leq\beta \tag{14.18} \end{equation}\] Consider the likelihood function taking derivatives w.r.t. \(\alpha\) we have \[\begin{equation} \frac{\partial\ell}{\partial\alpha}=\frac{n}{\alpha}+(\sum_{i=1}^n\log(x_i))-n\log(\beta) \tag{14.19} \end{equation}\] This is a monotonically decreasing function of \(\alpha\), thus set it to 0 we get the maximum on \(\alpha\) direction. On the other hand, (14.18) is a monotonically decreasing function for \(\beta\), which means the maximum on the \(\beta\) direction is obtained at the minimum of \(\beta\). Hence, we have the MLE as \[\begin{equation} \left\{\begin{aligned} &\hat{\alpha}=\frac{1}{\log(\hat{\beta})-1/n\sum_{i=1}^n\log(X_i)}\\ & \hat{\beta}=\max_i(X_i) \end{aligned} \right. \tag{14.20} \end{equation}\]

  2. Using (14.20), we have \(\hat{\alpha}=12.59\) and \(\hat{\beta}=25.0\).
## [1] 25

Exercise 14.6 (Casella and Berger 7.11) Let \(X_1,\cdots,X_n\) be i.i.d. with p.d.f. \[\begin{equation} f(x|\theta)=\theta x^{\theta-1},\quad 0\leq x\leq 1,0<\theta<\infty \tag{14.21} \end{equation}\]

  1. Find the MLE of \(\theta\), and show that its variance \(\to 0\) as \(n\to\infty\).

  2. Find the method of moments estimator of \(\theta\).

Proof. (a) The joint likelihood function of \(\theta\) is \[\begin{equation} f(\mathbf{x}|\theta)=\theta^n(\prod_{i=1}^nx_i)^{\theta-1} \tag{14.22} \end{equation}\] Thus, the log-likelihood function of \(\theta\) is \[\begin{equation} \ell(\theta)=n\log(\theta)+(\theta-1)\sum_{i=1}^n\log(x_i) \tag{14.23} \end{equation}\] Taking derivatives w.r.t. \(\theta\) we have \[\begin{equation} \ell^{\prime}=\frac{n}{\theta}+\sum_{i=1}^n\log(x_i) \tag{14.24} \end{equation}\] From (14.24) it is obvious that the second derivatives will be negative. Thus set the first derivative equals to 0 we have \(\hat{\theta}=-\frac{n}{\sum_{i=1}^n\log(X_i)}\).

To get the variance, notice that by variable transformation, \(Y_i=-\log(X_i)\) have an exponential distribution with parameter \(\theta\). Thus, \(-\sum_{i=1}^n\log(X_i)\sim Gamma(n,\theta)\). Hence, denote \(T=-\sum_{i=1}^n\log(X_i)\) we have \(\hat{\theta}=\frac{n}{T}\) and \[\begin{equation} \begin{split} &E(\frac{1}{T})=\frac{\theta^n}{\Gamma(n)}\int_{0}^{\infty}\frac{1}{t}t^{n-1}e^{-\theta t}dt=\frac{\theta}{n-1}\\ &E(\frac{1}{T^2})=\frac{\theta^n}{\Gamma(n)}\int_{0}^{\infty}\frac{1}{t^2}t^{n-1}e^{-\theta t}dt=\frac{\theta^2}{(n-1)(n-2)}\\ \end{split} \tag{14.25} \end{equation}\]
and thus \[\begin{equation} \begin{split} &E(\hat{\theta})=\frac{n\theta}{n-1}\\ &Var(\hat{\theta})=\frac{n^2\theta^2}{(n-1)^2(n-2)}\to\infty \end{split} \tag{14.26} \end{equation}\]

  1. The expectation of \(X\) is \[\begin{equation} E(x)=\int_0^1x\theta x^{\theta-1}dx=\frac{\theta}{\theta+1} \tag{14.27} \end{equation}\]
    Equating to \(\bar{X}\) we have \(\theta=\frac{\bar{X}}{1-\bar{X}}\).

Exercise 14.7 (Casella and Berger 7.15) Let \(X_1,\cdots,X_n\) be a sample from the inverse Gaussian p.d.f. \[\begin{equation} f(x|\mu,\lambda)=(\frac{\lambda}{2\pi x^3})^{1/2}exp\{-\lambda(x-\mu)^2/(2\mu^2x)\},\quad x>0 \tag{14.28} \end{equation}\]

  1. Show that the MLEs of \(\mu\) and \(\lambda\) are \[\begin{equation} \left\{\begin{aligned} & \hat{\mu}_n=\bar{X} \\ & \hat{\lambda}_n=\frac{n}{\sum_{i=1}^n\frac{1}{X_i}-\frac{1}{\bar{X}}} \end{aligned}\right. \tag{14.29} \end{equation}\]

  2. It has been showed that \(\hat{\mu}_n\) and \(\hat{\lambda}_n\) are independent and \(\hat{\mu}_n\) having an inverse Gaussian distribution with parameter \(\mu\) and \(n\lambda\) and \(n\lambda/\hat{\lambda}_n\) having a \(\chi_{n-1}^2\) distribution. This can be shown by induction.

    1. Show that \(\hat{\mu}_2\) has an inverse Gaussian distribution with parameters \(\mu\) and \(2\lambda\), \(2\lambda/\hat{\lambda}_2\) has a \(\chi_1^2\) distribution and they are independent.
  1. Assume the result is true for \(n=k\) and that we get a new, independent observation x. Estabilish the induction step by transform the p.d.f. \(f(x,\hat{\mu}_k,\hat{\lambda}_k)\) to \(f(x,\hat{\mu}_{k+1},\hat{\lambda}_{k+1})\). Show that this density factors in the appropriate way.

Proof. (a) The joint distribution is given by \[\begin{equation} f(\mathbf{x}|\mu,\lambda)=(\frac{\lambda}{2\pi})^{n/2}(\prod_{i=1}^nx_i)^{-3}exp\{-\frac{\lambda}{2}\sum_{i=1}^n\frac{(x_i-\mu)^2}{\mu^2x_i}\} \tag{14.30} \end{equation}\] For fixed \(\lambda\), maximizing w.r.t. \(\mu\) is equivalent to minimize the sum inside the exponential term. Thus \[\begin{equation} \frac{d}{d\mu}\sum_{i=1}^n\frac{(x_i-\mu)^2}{\mu^2x_i}=-\sum_{i=1}^n\frac{2(\frac{x_i}{\mu}-1)x_i}{\mu^2x_i} \tag{14.31} \end{equation}\] Setting this to 0 we have \(\sum_{i=1}^n(\frac{x_i}{\mu}-1)=0\) which implies \(\hat{\mu}=\bar{X}\). Plugging this back to (14.30) and maximizing w.r.t. \(\lambda\), with the form \(\lambda^{n/2}e^{-\lambda b}\) where \(b=\sum_{i=1}^n\frac{(x_i-\bar{x})^2}{2\bar{x}^2x_i}\) and maximize gives \(\hat{\lambda}=\frac{n}{2b}\) where \[\begin{equation} 2b=\sum_{i=1}^n\frac{x_i}{\bar{x}^2}-2\sum_{i=1}^n\frac{1}{\bar{x}}+\sum_{i=1}^n\frac{1}{x_i}=\sum_{i=1}^n(\frac{1}{x_i}-\frac{1}{\bar{x}}) \tag{14.32} \end{equation}\]

  1. See the paper

Exercise 14.8 (Casella and Berger 7.19) Suppose that the random variables \(Y_1,\cdots,Y_n\) satisfy \[\begin{equation} Y_i=\beta x_i+\epsilon_i,\quad i=1,\cdots,n \tag{14.33} \end{equation}\] where \(x_1,\cdots,x_n\) are fixed constants, and \(\epsilon_1,\cdots,\epsilon_n\) are i.i.d. \(N(0,\sigma^2)\) where \(\sigma^2\) unknown.

  1. Find a two-dimensional sufficient statistic for \((\beta,\sigma^2)\).

  2. Find the MLE of \(\beta\), show that it is an unbiased estimator of \(\beta\).

  3. Find the distribution of the MLE of \(\beta\).

Proof. (a) The likelihood function can be written as \[\begin{equation} \begin{split} L(\beta,\sigma^2|\mathbf{x},\mathbf{Y})&=(2\pi\sigma^2)^{-n/2}exp\{-\frac{1}{2\sigma^2}\sum_{i=1}^n(y_i-\beta x_i)^2\}\\ &=(2\pi\sigma^2)^{-n/2}exp(-\frac{\beta^2\sum_{i=1}^nx_i^2}{2\sigma^2})exp(-\frac{-\sum_{i=1}^ny_i}{2\sigma^2}+\frac{\beta}{\sigma^2}\sum_{i=1}^nx_iy_i) \end{split} \tag{14.34} \end{equation}\] Thus, the sufficient statistic is \((\sum_{i=1}^nY_i^2,\sum_{i=1}^nx_iY_i)\), (because \(\sum_{i=1}^nx_i\) is assumed fixed constant).

  1. The log-likelihood function is \[\begin{equation} \ell(\beta,\sigma^2)=-\frac{n}{2}\log(2\pi)-\frac{n}{2}\log(\sigma^2)-\frac{1}{2\sigma^2}\sum_{i=1}^ny_i^2+\frac{\beta}{\sigma^2}\sum_{i=1}^nx_iy_i-\frac{\beta^2}{2\sigma^2}\sum_{i=1}^nx_i^2 \tag{14.35} \end{equation}\] For a fixed value of \(\sigma^2\), \[\begin{equation} \frac{\partial\ell}{\partial\beta}=\frac{1}{\sigma^2}\sum_{i=1}^nx_iy_i-\frac{\beta}{\sigma^2}\sum_{i=1}^nx_i^2 \tag{14.36} \end{equation}\] Set to 0 we have \(\hat{\beta}=\frac{\sum_{i=1}^nx_iY_i}{\sum_{i=1}^nx_i^2}\) also the second derivative w.r.t. \(\beta\) is \[\begin{equation} \frac{\partial^2\ell}{\partial\beta^2}=-\frac{1}{\sigma^2}\sum_{i=1}^nx_i^2<0 \tag{14.37} \end{equation}\] Thus, \(\ell\) obtain its maximum at \(\hat{\beta}\), it dose not depend on \(\sigma^2\) so it is MLE. And \[\begin{equation} E(\hat{\beta})=\frac{\sum_{i=1}^nx_iE(y_i)}{\sum_{i=1}^nx_i^2}=\frac{\sum_{i=1}^nx_i\cdot\beta x_i}{\sum_{i=1}^nx_i^2}=\beta \tag{14.38} \end{equation}\]

  2. Since \(\hat{\beta}\) is just the weighted sum of normally distributed random variable, i.e. \[\begin{equation} \hat{\beta}=\frac{\sum_{i=1}^nx_iY_i}{\sum_{i=1}^nx_i^2}=\sum_{i=1}^n(\frac{x_i}{\sum_{i=1}^nx_i^2})Y_i \tag{14.39} \end{equation}\] we have \(\hat{\beta}\) is normally distributed with mean \(\beta\) and variance \[\begin{equation} Var(\hat{\beta})=\sum_{i=1}^n(\frac{x_i}{\sum_{i=1}^nx_i^2})^2\sigma^2=\frac{\sigma^2}{\sum_{i=1}^nx_i^2} \tag{14.40} \end{equation}\]
Exercise 14.9 (Casella and Berger 7.37) Let \(X_1,\cdots,X_n\) be random sample from a population with p.d.f. \[\begin{equation} f(x|\theta)=\frac{1}{2\theta},\quad -\theta<x<\theta,\theta>0 \tag{14.41} \end{equation}\] Find, if one exists, a best unbiased estimator of \(\theta\).

Proof. The joint p.d.f. of \(X_1,\cdots,X_n\) is \[\begin{equation} f(\mathbf{x}|\theta)=(2\theta)^{-n}I_{x<\theta}(\max_{i}|x_i|) \tag{14.42} \end{equation}\] By factorization theorem, \(\max_i|X_i|\) is a sufficient statistic.

Denote \(\max_i|x_i|=Y\), then firstly the c.d.f. of \(Y\) is \[\begin{equation} \begin{split} F_Y(y)&=Pr(Y\leq y)=pr(\max_i|X_i|\leq y)\\ &=(Pr(|X_i|\leq y))^n=(\frac{y}{\theta})^n \end{split} \tag{14.43} \end{equation}\]
Thus, the p.d.f. of \(Y\) is \[\begin{equation} f_Y(y)=\frac{dF_Y(y)}{dy}=\frac{ny^{n-1}}{\theta^n} \tag{14.44} \end{equation}\]

Hence, by definition of complete statistic, if for some function \(g(\cdot)\) we have \(E(g(Y))=0\), then \[\begin{equation} E(g(Y))=\int_{0}^{\theta}g(y)\frac{ny^{n-1}}{\theta^n}dy=\frac{n}{\theta^n}\int_{0}^{\theta}g(y)y^{n-1}dy=0 \tag{14.45} \end{equation}\] (14.45), as a function of \(\theta\) is a constant 0, thus, the integral part should also be 0 despite the value of \(\theta\), which indicates that taking derivatives w.r.t. \(\theta\) is 0. Thus, \[\begin{equation} g(\theta)\theta^{n-1}=0,\quad\forall\theta \tag{14.46} \end{equation}\] Therefore, \(g(Y)=0\) almost surely and \(Y\) is a complete statistic.

Finally, the expected value of \(Y\) is \[\begin{equation} E(Y)=\int_{0}^{\theta}y\frac{ny^{n-1}}{\theta^n}dy=\frac{n}{n+1}\theta \tag{14.47} \end{equation}\] Thus, \(\frac{n+1}{n}\max_i|X_i|\) is an unbiased estimator of \(\theta\). Also because it is a function of complete sufficient statistic, it is best unbiased estimator.
Exercise 14.10 (Casella and Berger 7.40) Let \(X_1,\cdots,X_n\) be i.i.d. \(Bernoulli(p)\). Show that the variance of \(\bar{X}\) attains the Cramer-Rao Lower Bound, and hence \(\bar{X}\) is the best unbiased estimator of \(p\).

Proof. Firstly, we have \[\begin{equation} \frac{d}{dp}E_{p}(\bar{X})=\frac{d}{dp}p=1 \tag{14.48} \end{equation}\] and the likelihood as \[\begin{equation} f(\mathbf{x}|\theta)=p^{\sum_{i=1}^nx_i}(1-p)^{n-\sum_{i=1}^nx_i} \tag{14.49} \end{equation}\] and hence the log-likelihood as \[\begin{equation} \ell(p)=(\sum_{i=1}^nx_i)\log(p)-(n-\sum_{i=1}^nx_i)\log(1-p) \tag{14.50} \end{equation}\] Therefore \[\begin{equation} \ell^{\prime}(p)=\frac{n(\bar{x}-p)}{p(1-p)} \tag{14.51} \end{equation}\] and finally \[\begin{equation} E_{p}((\ell^{\prime}(p))^2)=\frac{n^2}{(p(1-p))^2}E_{p}((\bar{x}-p)^2)=\frac{n}{p(1-p)} \tag{14.52} \end{equation}\] Thus, from (14.48) and (14.52) we have the Cramer-Rao lower bound as \(\frac{p(1-p)}{n}\), which is exactly the variance of \(\bar{X}\). Hence, \(\bar{X}\) is the UMVUE of \(p\).

Exercise 14.11 (Casella and Berger 7.48) Suppose that \(X_i\), \(i=1,\cdots,n\) are i.i.d. \(Bernoulli(p)\).

  1. Show that the variance of the MLE of \(p\) attains the Cramer-Rao Lower Bound.

  2. For \(n\geq4\), show that the product \(X_1X_2X_3X_4\) is an unbiased estimator of \(p^4\), and use this fact to find the best unbiased estimator of \(p^4\).

Proof. (a) The log-likelihood is \[\begin{equation} \ell(p)=(\sum_{i=1}^nx_i)\log(p)-(n-\sum_{i=1}^nx_i)\log(1-p) \tag{14.53} \end{equation}\] with derivatives \[\begin{equation} \ell^{\prime}(p)=\frac{n(\bar{x}-p)}{p(1-p)} \tag{14.54} \end{equation}\] Setting to 0 we have \(\hat{p}=\bar{X}\). And the first derivatives can also be written as \[\begin{equation} \ell^{\prime}(p)=\frac{\sum_{i=1}^nx_i}{p}-\frac{n-\sum_{i=1}^nx_i}{1-p} \tag{14.55} \end{equation}\] Since as a function of \(p\), the plus part of (14.55) is decreasing as \(p\) increase, and the minus part increase as \(p\) increase. Hence, it is a monotonically decreasing function as \(p\), suggesting that \(\hat{p}=\bar{X}\) is the MLE of \(p\). From Exercise (exr:exr14010) we have already obtained that \(\bar{X}\) attaines the Cramer-Rao lower bound.

  1. Using identically distributed and independence, we have \[\begin{equation} E_{p}(\prod_{i=1}^4X_i)=(E_p(X_1))^4=p^4 \tag{14.56} \end{equation}\] and hence it is an unbiased estimator.
In this case, \(\sum_{i=1}^nX_i\) is a complete sufficient statistic. Hence, by Corollary @(cor:cor13001), \(E(X_1X_2X_3X_4|\sum_{i=1}^nX_i)\) is the best unbiased estimator. Therefore, \(\sum_{i=1}^nX_i<4\), then at least one of \(X_i\), \(i=1,2,3,4\), must be 0 and in that case \(E(X_1X_2X_3X_4|\sum_{i=1}^nX_i)=0\). Oherwise if \(\sum_{i=1}^nX_i\geq4\), \[\begin{equation} \begin{split} E(X_1X_2X_3X_4|\sum_{i=1}^nX_i=t)&=1\cdot\frac{Pr(x_1=x_2=x_3=x_4=1,\sum_{i=5}^nx_i=t-4)}{Pr(\sum_{i=1}^nx_i=t)}\\ &=\frac{p^4{{n-4} \choose {t-4}}p^{t-4}(1-p)^{n-t}}{{n \choose t}p^t(1-p)^{n-t}}\\ &=\frac{{{n-4} \choose {t-4}}}{{n \choose t}} \end{split} \tag{14.57} \end{equation}\] Thus, the best unbiased estimator of \(p^4\) is \[\begin{equation} \left\{\begin{aligned} &\frac{{{n-4} \choose {\sum_{i=1}^nX_i-4}}}{{n \choose {\sum_{i=1}^nX_i}}} &\quad \sum_{i=1}^nX_\geq 4\\ & 0 & \quad \sum_{i=1}^nX_<4 \end{aligned} \right. \tag{14.58} \end{equation}\]

Exercise 14.12 (Casella and Berger 7.49) Let \(X_1,\cdots,X_n\) be i.i.d. \(Exp(\lambda)\)

  1. Find an unbiased estimator of \(\lambda\) based only on \(Y=\min\{X_1,\cdots,X_n\}\).

  2. Find a better estimator than the one in part (a). Prove that it is better.

  3. For the following data, estimate the \(\lambda\) using the estimators from parts (a) and (b).

50.1, 70.1, 137.0, 166.9, 170.5, 152.8, 80.5, 123.5, 112.6, 148.5, 160.0, 125.4

Proof. (a) The c.d.f. of \(Y\) is \[\begin{equation} F_Y(y)=1-(1-F_X(y))^n \tag{14.59} \end{equation}\] thus, the p.d.f. of \(Y\) is \[\begin{equation} f_Y(y)=n(1-F_X(y))^{n-1}f_x(y)=n\lambda e^{-n\lambda y} \tag{14.60} \end{equation}\] It suggests that \(Y\sim Exp(n\lambda)\), and hence \(E_{\lambda}(Y)=\frac{1}{n\lambda}\) which means that \(nY\) is an unbiased estimator of \(\lambda\).

  1. Since the distribution of \(X\) belongs to the exponential family, \(\sum_{i=1}^nX_i\) is a complete sufficient statistic. Therefore, \(E(nY|\sum_{i=1}^nX_i)\) is the best unbiased estimator. By completeness, any function \(g(\cdot)\) of \(\sum_{i=1}^nX_i\) equals to 0 almost surely if \(E(g(\sum_{i=1}^nX_i))=0\). Consider \(g(\sum_{i=1}^nX_i)=\sum_{i=1}^nX_i-nE(nY|\sum_{i=1}^nX_i)\), we have \[\begin{equation} \begin{split} E(g(\sum_{i=1}^nX_i))&=E(\sum_{i=1}^nX_i)-E(nE(nY|\sum_{i=1}^nX_i))\\ &=n\lambda-nE(nY)=0 \end{split} \tag{14.61} \end{equation}\] Hence, \(\sum_{i=1}^nX_i-nE(nY|\sum_{i=1}^nX_i)=0\) almost surely which means \(E(nY|\sum_{i=1}^nX_i)=\frac{\sum_{i=1}^nX_i}{n}=\bar{X}\) is the best unbiased estimator of \(\lambda\).

Actually, we have \[\begin{equation} Var(nY)=\frac{1}{\lambda^2}\geq\frac{1}{n\lambda^2}=Var(\bar{X}) \tag{14.62} \end{equation}\] which means \(\bar{X}\) does better than \(nY\).

  1. From (a) and (b) we have two unbiased estimator as \(\hat{\lambda}_1=nY=601.2\) and \(\hat{\lambda}_2=\bar{X}=124.825\). The large difference between them suggests the exponential model is not a good fit of data.
## [1] 601.200 124.825

Exercise 14.13 (Casella and Berger 7.50) Let \(X_1,\cdots,X_n\) be i.i.d. \(N(\theta,\theta^2)\), \(\theta>0\). For this model both \(\bar{X}\) and \(cS\) are unbiased estimators of \(\theta\), where \(c=\frac{\sqrt{n-1}\Gamma((n-1)/2)}{\sqrt{2}\Gamma(n/2)}\).

  1. Prove that for any number \(a\) the estimator \(a\bar{X}+(1-a)(cS)\) is an unbiased estimator of \(\theta\).

  2. Find the value of \(a\) that produces the estimator with minimum variance.

  3. Show that \((\bar{X},S^2)\) is a sufficient statistic for \(\theta\) but it is not a complete sufficient statistic.

Proof. (a) Since we have \(E(\bar{X})=E(cS)=\theta\), \[\begin{equation} E(a\bar{X}+(1-a)(cS))=aE(\bar{X})+(1-a)E(cS)=\theta \tag{14.63} \end{equation}\] which suggests that \(a\bar{X}+(1-a)(cS)\) is an unbiased estimator of \(\theta\).

  1. Since the variance of \(a\bar{X}+(1-a)(cS)\) is \[\begin{equation} Var(a\bar{X}+(1-a)(cS))=a^2\frac{\theta^2}{n}+(1-a)^2c^2[E(S^2)-(E(S))^2]=\theta^2[(c^2-1+\frac{1}{n})a^2-2(c^2-1)a+(c^2-1)] \tag{14.64} \end{equation}\] As a function of \(a\), (14.64) attains its minimum at \(a=\frac{c^2-1}{c^2-1+(1/n)}\).

  2. The likelihood function of \(\theta\) is \[\begin{equation} f(\mathbf{x}|\theta)=(2\pi\theta^2)^{-n/2}exp(-\frac{(x_i-\bar{x})^2}{2\sigma^2})exp(-\frac{S^2}{2\sigma^2}) \tag{14.65} \end{equation}\] Thus, \((\bar{X},S^2)\) is sufficient. But since \(g(\bar{X},S^2)=\bar{X}-c\sqrt{S^2}\) have expectation 0 and not equals to 0 almost surely, it is not complete.

Exercise 14.14 (Casella and Berger 7.52) Let \(X_1,\cdots,X_n\) be i.i.d. \(Pois(\lambda)\), and let \(\bar{X}\) abd \(S^2\) denote the sample mean and variance, respectively.

  1. Prove that \(\bar{X}\) us the best unbiased estimator of \(\lambda\) without using the Cramer-Rao Theorem.

  2. Prove the rather remarkable identity \(E(S^2|\bar{X})=\bar{X}\), and use it to explicity demonstrate that \(Var(S^2)>Var(\bar{X})\).

  3. Using completeness, can a general theorem be formulated for which the identity in part (b) is a special case?

Proof. (a) Since Poisson distribution belongs to exponential family, we have \(\bar{X}\) as a complete sufficient statistic. Furthermore, \(E(\bar{X})=\lambda\). Hence, \(\bar{X}\) is the best unbiased estimator of \(\theta\).

  1. Since \(\bar{X}\) is a complete sufficnet statistic, \(E(S^2|\bar{X})\) is a function of \(\bar{X}\). Furthermore, by completeness, any function \(g(\cdot)\) of \(\bar{X}\) satisfies \(E(g(\bar{X}))=0\) equals to 0 almost surely. Hence, consider \(g(\bar{X})=\bar{X}-E(S^2|\bar{X})\), we have \[\begin{equation} E(\bar{X}-E(S^2|\bar{X}))=E(\bar{X})-E(S^2)=\lambda-\lambda=0 \tag{14.66} \end{equation}\] Thus, \(E(S^2|\bar{X})=\bar{X}\) almost surely.

By law of total variance, we imediately have \[\begin{equation} Var(S^2)=Var(E(S^2|\bar{X}))+E(Var(S^2|\bar{X}))=Var(\bar{X})+E(Var(S^2|\bar{X}))>Var(\bar{X}) \tag{14.67} \end{equation}\]
By uniqueness of best unbiased estimator, we have the strict less than here.

  1. Let \(T(\mathbf{X})\) be a complete sufficient statistic and \(T^*(\mathbf{X})\) be any other statistic such that \(E(T(X))=E(T^*(X))\), then \(E(T^*(X)|T(X))=T(X)\) and \(Var(T^*(X))>Var(T(X))\).
In general, if \(T(\mathbf{X})\) is a complete sufficient statistic, then for any other statistic \(T^*(\mathbf{X})\), try to compute \(E(T^*(\mathbf{X})|T(\mathbf(X)))\). Firstly by sufficiency, \(E(T^*(\mathbf{X})|T(\mathbf(X)))\) is a function of \(T(\mathbf{X})\). Then use definition of completeness, construct a function of \(T(\mathbf{X})\) with form \(T(\mathbf{X})-CE(T^*(\mathbf{X})|T(\mathbf(X)))\) where \(C\) is a constant such that \(E(T(\mathbf{X})-CE(T^*(\mathbf{X})|T(\mathbf(X))))=0\). By completeness we have \(T(\mathbf{X})-CE(T^*(\mathbf{X})|T(\mathbf(X)))=0\) almost surely and thus \(E(T^*(\mathbf{X})|T(\mathbf(X)))=T(\mathbf{X})/C\).

Exercise 14.15 (Casella and Berger 7.55) For each of the following p.d.f., let \(X_1,\cdots,X_n\) be a sample from that distribution. In each case, find the best unbiased estimator of \(\theta^r\).

  1. \(f(x|\theta)=\frac{1}{\theta}\), \(0<x<\theta,r<n\).

  2. \(f(x|\theta)=e^{-(x-\theta)}\), \(x>\theta\).

  3. \(f(x|\theta)=\frac{e^{-x}}{e^{-\theta}-e^{-b}}\), \(\theta<x<b\), \(b\) known.

Proof. Consider the general case. Suppose \(X\sim f(x|\theta)=c(\theta)m(x)\), \(a<x<\theta\). Then \(\frac{1}{c(\theta)}=\int_{a}^{\theta}m(x)dx\), taking derivatives w.r.t. \(\theta\) on both sides gives \(-\frac{c^{\prime}(\theta)}{c(\theta)}=m(\theta)c(\theta)\).The c.d.f. of \(X\) is \(F_X(x)=\frac{c(\theta)}{c(x)}\), \(a<x<\theta\). Let \(Y=\max_i(X_i)\), then using same technique as in Example 6.6 we have \(Y\) is a complete sufficient statistic. Thus, any function \(T(Y)\) that is an unbiased estimator of \(h(\theta)\) is the best unbiased estimator of \(h(\theta)\). Since the p.d.f. of \(Y\) is \[\begin{equation} f_Y(y|\theta)=\frac{nm(y)c(\theta)^n}{c(y)^{n-1}},\quad a<\theta<y \tag{14.68} \end{equation}\] Consider the equations \(\int_a^{\theta}f(x|\theta)dx=1\) and \(\int_a^{\theta}T(y)g(y|\theta)dy=h(\theta)\), which is equivalent to \[\begin{equation} \int_a^{\theta}m(x)dx=\frac{1}{c(\theta)} \tag{14.69} \end{equation}\] and \[\begin{equation} \int_a^{\theta}\frac{T(y)nm(y)}{c(y)^{n-1}}dy=\frac{h(\theta)}{c(\theta)^n} \tag{14.70} \end{equation}\] Differentiating (14.69) and (14.70) on both sides w.r.t. \(\theta\) we have \[\begin{equation} \left\{\begin{aligned} & m(\theta)=-\frac{c^{\prime}(\theta)}{c(\theta)^2} \\ & m(\theta)=\frac{c(\theta)h^{\prime}(\theta)-h(\theta)nc^{\prime}(\theta)}{c(\theta)^2T(\theta)n}\end{aligned}\right. \tag{14.71} \end{equation}\]

Solve the equations for \(T(y)\) by change \(\theta\) to \(Y\) in (14.71) we have \[\begin{equation} T(y)=h(y)+\frac{h^{\prime}(y)}{nm(y)c(y)} \tag{14.72} \end{equation}\] which is the expression of best unbiased estimator of \(h(\theta)\).

Note also the condition for (14.72) holds can be extend to any p.d.f. with upper bound of support as \(\theta\). The resaon is (14.71) holds for all of those cases. Here we only write the form of support as \(a<x<\theta\) to simplify the notation. Also if the lower bound of support is \(\theta\), \(\min_i(X_i)\) is the complete sufficient statistic in those cases, by same technique we will have

\[\begin{equation} T(y)=h(y)-\frac{h^{\prime}(y)}{nm(y)c(y)} \tag{14.73} \end{equation}\] as the best unbiased estimator.

In this problem, for \(h(\theta)=\theta^r\) and \(h^{\prime}(\theta)=r\theta^{r-1}\),

  1. For this p.d.f. \(m(x)=1\), \(c(\theta)=\frac{1}{\theta}\), hence \[\begin{equation} T(Y)=y^r+\frac{ry^{r-1}}{n(1/y)}=\frac{n+r}{n}y^r \tag{14.74} \end{equation}\] is the best unbiased estimator, where \(Y=\max_i(X_i)\).

  2. For this p.d.f. \(m(x)=e^{-x}\) and \(c(\theta)=e^{\theta}\), \(\theta\) is the lower bound so \(Y=\min_i(X_i)\) and \[\begin{equation} T(Y)=y^r-\frac{ry^{r-1}}{ne^{-y}e^{y}}=y^r-\frac{ry^{r-1}}{n} \tag{14.75} \end{equation}\] is the best unbiased estimator, where \(Y=\min_i(X_i)\).

  3. For this p.d.f. \(m(x)=e^{-x}\) and \(c(\theta)=1/(e^{-\theta}-e^{-b})\), \(\theta\) is the lower bound so \(Y=\min_i(X_i)\) and \[\begin{equation} T(Y)=y^r-\frac{ry^{r-1}}{ne^{-y}}(e^{-y}-e^{-b}) \tag{14.76} \end{equation}\] is the best unbiased estimator, where \(Y=\min_i(X_i)\).

Exercise 14.16 (Casella and Berger 7.58) Let \(X\) be an observation from the p.d.f. \[\begin{equation} f(x|\theta)=(\frac{\theta}{2})^{|x|}(1-\theta)^{1-|x|},\quad x=-1,0,1;\, 0\leq\theta\leq 1. \tag{14.77} \end{equation}\]

  1. Find the MLE of \(\theta\).

  2. Define the estimator \(T(X)\) by \[\begin{equation} T(X)=\left\{\begin{aligned} 2 & \quad x=1\\ 0 &\quad o.w. \end{aligned}\right. \tag{14.78} \end{equation}\] Show that \(T(X)\) is an unbiased estimator of \(\theta\).

  3. Find a better estimator than \(T(X)\) and prove that it is better.

Proof. (a) Since the log-likelihood function is \[\begin{equation} \ell(\theta)=\log(f(\mathbf{x}|\theta))=|x|\log(\theta)+(1-|x|)\log(1-\theta)-|x|\log(2) \tag{14.79} \end{equation}\] Taking derivatives w.r.t. \(\theta\) we have \[\begin{equation} \ell^{\prime}(\theta)=\frac{|x|}{\theta}-\frac{1-|x|}{1-\theta} \tag{14.80} \end{equation}\] Setting to 0 we have \(\hat{\theta}=1\). Since (14.80) is a monotonically decreasing function of \(\theta\), we have \(\hat{\theta}\) is the global maximum and therefore the MLE of \(\theta\) is 1.

  1. Just compute the expectation of \(T(X)\), we have \[\begin{equation} E(T(X))=2\cdot(\frac{\theta}{2})^{|1|}(1-\theta)^{1-|1|}=\theta \tag{14.81} \end{equation}\] Thus, \(T(X)\) is unbiased estimator of \(\theta\).

  2. In Exercise 10.4 we have proved that \(|X|\) is a complete sufficient statisitc of \(\theta\). Hence \(E(T(X)||X|)\) is the best unbiased estimator. Since \(E(|X|)=\theta\), by the general result in Exercise @(exr:exr14014), \(E(T(X)||X|)=|X|\) is the best unbiased estimator. Actually, \[\begin{equation} \begin{split} &Var(T(X))=E((T(X))^2)-(E(T(X)))^2=2\theta-\theta^2\\ &>\theta-\theta^2=Var(|X|) \end{split} \tag{14.82} \end{equation}\]