# 第 5 章 综合练习

## 5.1 2018秋季试卷

Part I: Each problem is worth 3 points.

1. Let $$X_1,X_2,\dots,X_6$$ be a simple random sample taken from $$N(0,2^2)$$. Denote $Y = (X_1+X_2)^2+(X_3+X_4)^2+(X_5+X_6)^2.$ If $$kY\sim \chi^2(3)$$, then $$k=$$___?

2. Let $$X_1,X_2,X_3$$ be a simple random sample taken from $$N(\mu,\sigma^2)$$. If $$\hat\mu = \frac{1}{2} X_1+cX_2+\frac{1}{6}X_3$$ is an unibased estimate of $$\mu$$, then $$c=$$___?

3. Let $$X_1,X_2,X_3$$ be a simple random sample taken from $$B(1,p)$$. For testing the hypothesis $$H_0:p=1/2\ vs.\ H_1:p=3/4$$, we use a rejection region: $W=\{(x_1,x_2,x_3):x_1+x_2+x_3\ge 2\}.$ The power of the test is ___?

4. Let $$X_1,\dots,X_n$$ be a simple random sample taken from $$N(\mu,1)$$, and let $$S_n^2=\frac 1n\sum_{i=1}^n(X_i-\bar X)^2$$ be the sample variance. Then $$Var[S_n^2]=$$___?

5. If the usual $$95\%$$ confidence interval for the mean of normal population was $$[0.12,0.22]$$, the method of moments estimate of the mean would be ___?

Part II: Multiple Choice Problems (one or more than one items may be true). Each problem is worth 3 points.

1. The parameters $$\theta,\lambda,\alpha,\beta$$ are unknown in the following densities. Which of the following probability distributions belong to the exponential family? ( )

A. $$f(x;\theta,\lambda) = \frac \theta\lambda\left(\frac{x}{\lambda}\right)^{\theta-1}e^{-(x/\lambda)^\theta}1\{x> 0\}$$

B. $$f(x;\alpha,\beta) = \frac{\Gamma(\alpha+\beta)}{\Gamma(\alpha)\Gamma(\beta)}x^{\alpha-1}(1-x)^{\beta-1}1\{0<x<1\}$$, where $$\Gamma(\cdot)$$ is the gamma function.

C. $$f(x;\lambda) = \frac{\lambda^\alpha}{\Gamma(\alpha)}x^{\alpha-1}e^{-\lambda x}1\{x> 0\}$$

D. $$f(x;\theta) = \frac{2}{\sqrt{2\pi}}e^{-\frac{(x-\theta)^2}{2}}1\{x\ge \theta\}$$

1. Let $$X_1,\dots,X_n$$ be the simple random sample taken from the normal distribution $$N(\mu,\sigma^2)$$, where $$\mu,\sigma^2$$ are unknown parameters. Which of the following are sufficient statistics for $$\theta=(\mu,\sigma^2)$$? ( )

A. $$T_1 = (X_1,\dots,X_n)$$

B. $$T_2 = (\sum_{i=1}^n X_i,\sum_{i=1}^n X_i^2)$$

C. $$T_3 = (\sum_{i=1}^n |X_i|,\sum_{i=1}^n X_i^2)$$

D. $$T_4 = \frac{1}{n}\sum_{i=1}^n X_i$$

1. Which of the following statements are true? ( )

A. If the $$p$$-value is 0.05, the corresponding test will be rejected at the significance level 0.03.

B. If a test rejects at significance level 0.05, then the $$p$$-value is less than or equal to 0.05.

C. If the significance level of a test is decreased, the power of the test would be expected to decrease.

D. A type II error occurs when the test statistic falls in the rejection region of the test and the null is true.

1. Let $$\hat\beta_0,\hat\beta_1$$ be the least squares etstimators for the simple linear model $$y_i = \beta_0+\beta_1x_i+\epsilon_i,\ i=1,\dots,n$$, where $$\epsilon_i\stackrel{iid}{\sim} N(0,\sigma^2)$$. Which of the following statements are true? ( )

A. $$\hat\beta_0$$ and $$\hat\beta_1$$ are independent.

B. $$\hat\beta_0-\hat\beta_1$$ is normally distributed.

C. The more spread out the $$x_i$$ are the better we can estimate the slope $$\beta_1$$.

D. $$\bar y = \hat\beta_0+\hat\beta_1 \bar x$$, where $$\bar x = \frac 1 n\sum_{i=1}^n x_i,\ \bar y = \frac 1 n\sum_{i=1}^n y_i$$.

1. Let $$X_1,\dots,X_n$$ be a simple random sample taken from $$N(2,3^2)$$, and let $$\bar X$$ be the sample mean. Which of the following are true? ( )

A. $$\frac{\bar X -2}{3/\sqrt{n}}\sim t(n)$$

B. $$\frac 1 9\sum_{i=1}^n (X_i-2)^2\sim F(n,1)$$

C. $$\frac{\bar X-2}{\sqrt{3}/\sqrt{n}}\sim N(0,1)$$

D. $$\frac 1 9\sum_{i=1}^n(X_i-2)^2\sim \chi^2(n)$$

Part III. (15 points)

Let $$X_1,\dots,X_n$$ be a simple random sample taken from the density

$f(x;\theta)=\frac{2x}{\theta^2},\quad 0\le x\le \theta.$

1. Find an expression for $$\hat\theta_L$$, the maximum likelihood estimator (MLE) for $$\theta$$.

2. Find an expression for $$\hat\theta_M$$, the method of moments estimator for $$\theta$$.

3. For the two estimators $$\hat\theta_L$$ and $$\hat\theta_M$$, which one is more efficient in terms of mean squared error (MSE)?

Part IV. (10 points)

Let $$X_1,\dots,X_n$$ be a simple random sample taken from an exponential distribution $$Exp(\lambda)$$, whose density is given by $f(x;\lambda) = \lambda e^{-\lambda x}1\{x\ge 0\},\ \lambda>0.$ Derive a likelihood ratio test of the hypothesis $H_0:\lambda=1\ vs.\ H_1:\lambda=2.$ What is the definition of uniformly most powerful (UMP)? Is the test UMP against the alternative $$H_1:\lambda>1$$?

Part V. (10 points)

A medical researcher believes that women typically have lower serum cholesterol (血清胆固醇) than men. To test this hypothesis, he took a sample of 476 men between the ages of nineteen and forty-four and found their mean serum cholesterol to be 189.0 mg/dl with a sample standard deviation of 34.2. A group of 592 women in the same age range averaged 177.2 mg/dl and had a sample standard deviation of 33.3. Is the lower average for the women statistically significant? Set the significant level $$\alpha$$ =0.05. What assumptions are made when conducting the test? ($$u_{0.95}=1.644854$$, $$t_{0.95}(1066)=1.646284$$, $$t_{0.95}(1068)=1.646282$$, $$u_{0.975}=1.959964$$, $$t_{0.975}(1066)=1.962192$$, $$t_{0.975}(1068)=1.962188$$)

Part VI. (10 points)

Let $$X_1,\dots,X_n$$ be a simple random sample taken from the uniform distribution $$U(\theta,0)$$, where $$\theta<0$$.

(a). Derive a $$100(1-\alpha)\%$$ confidence interval for $$\theta$$.

(b). There is a duality between confidence intervals and hypothesis tests. Use the result in part (a) to derive a test at significant level $$\alpha$$ of the hypothesis $H_0: \theta = \theta_0\ vs.\ H_1:\theta \neq \theta_0,$ where $$\theta_0<0$$ is fixed.

Part VII. (10 points)

Consider the linear model $y_i=\beta_0+\beta_1x_i+\epsilon_i,\ \epsilon_i\stackrel{iid}{\sim} N(0,\sigma^2),\ i=1,\dots,n.$ Suppose that all the fixed $$x_i$$ are not equal and $$n\ge 3$$.

(a). Derive a maximum likelihood estimator (MLE) $$\hat\sigma_L^2$$ for $$\sigma^2$$.

(b). Let $$T_k=k\hat\sigma_L^2$$ be an estimate of $$\sigma^2$$. Find a $$k\in \mathbb{R}$$ such that $$T_k$$ is an unbiased estimate of $$\sigma^2$$. Show that the unbiased estimate is not the optimal choice by taking account of mean squared error (MSE), and the most efficient $$T_k$$ takes place at $$k=1$$, i.e., the MLE $$\hat\sigma_L^2$$.

Part VIII. (10 points)

Consider the multiple linear regression model $y_i = \beta_0+\beta_1 x_{i1}+\beta_2x_{i2}+\dots+\beta_{p-1}x_{i,p-1} +\epsilon_i,$ where $$i=1,\dots,n$$ and $$n>p\ge 2$$.

(a). Find the least squares estimates (LSE) of $$\beta_0,\dots,\beta_{p-1}$$ via the matrix formalism. What assumptions are required for ensuring a unique solution of the LSE?

(b). Show that the the residuals sum to zero. Are the standard assumptions of $$E[\epsilon_i]=0$$ for $$i=1,\dots,n$$ required to establish the statement?

(c). Suppose that $$\epsilon_i\stackrel{iid}{\sim} N(0,\sigma^2)$$, where $$\sigma>0$$ is an unknown parameter. Define $$\alpha = \sum_{i=1}^{p-1} \beta_i^2$$. Use the generalized likelihood ratio method to test the hypothesis

$H_0: \alpha = 0\ vs.\ H_1:\alpha>0.$ If the coefficient of determination $$R^2=0.95$$, $$p = 3$$ and $$n=13$$, is the null rejected at the significant level $$\alpha =0.05$$? ($$F_{0.95}(2,10)=4.10,F_{0.95}(3,10)=3.71,t_{0.95}(10)=1.81$$)

## 5.2 2019春季试卷

Part I: Each problem is worth 3 points.

1. Let $$X_1,\dots,X_{10}$$ be i.i.d. sample of $$X\sim \text{Exp}(1)$$. If $$2\sum_{i=1}^{10} X_i\sim \chi^2(k)$$, then the value of $$k$$ is __________

2. What is the definition of Type I error?_______________________________

3. Let $$T\sim t(10)$$. It is known that $$P(T\le 1.8)=0.95$$. Then $$F_{0.9}(1,10)=$$___________

4. If the $$95\%$$ confidence interval for the mean of a normally distributed population with known variance is $$[1.2, 1.4]$$ based on a sample of size $$100$$, how much larger a sample do you think you would need to halve the length of the confidence interval （该置信区间长度减半需要增加多少样本）? ____________

5. Show one advantage of the maximum likelihood method compared to the method of moments. ___________________________

Part II: Multiple-Choice Problems (ONLY one of the items is true). Each problem is worth 3 points.

1. Let $$X_1,\dots,X_n$$ be i.i.d. sample of $$X\sim N(\mu,\sigma^2)$$, where $$\mu,\sigma$$ are unknown parameters. Which one of the following is NOT a statistic. ( )

A. $$X_1+X_2+\dots+X_n$$

B. $$X_{(1)} = \min\{X_1,X_2,\dots,X_n\}$$

C. $$\frac{\bar X-\mu}{\sigma/\sqrt{n}}$$

D. $$g(\bar X)$$, where $$g(\cdot)$$ is a given function over $$\mathbb{R}$$.

1. Consider the problem of testing

$H_0:\mu=0\ vs.\ H_1:\mu>0.$

The power functions of four rejection regions are plotted below. Which one might be the uniformly most powerful (UMP) rejection region at the significance level $$\alpha = 0.05$$? ( )

1. Let $$X_1,\dots,X_n$$ $$(n\ge 3)$$ be a smple of a Weibull population with denstiy

$f(x;k,\lambda)=\frac k\lambda\left(\frac{x}{\lambda}\right)^{k-1}e^{-(x/\lambda)^k}1\{x> 0\},$

where $$k>0,\lambda>0$$ are unknown parameters. Which of following is a sufficient statistic for $$\theta=(k,\lambda)$$? ( )

A. $$T_1 = (X_1,\dots, X_n)$$

B. $$T_2 = \prod_{i=1}^n X_i$$

C. $$T_3 = \sum_{i=1}^n X_i^k$$

D. $$T_4 = (\sum_{i=1}^n X_i^k, \prod_{i=1}^n X_i)$$

1. Let $$\hat\beta_0,\hat\beta_1,\hat\beta_2$$ be the least squares estimators for the multiple linear model

$y_i = \beta_0+\beta_1x_{i1}+\beta_2x_{i2}+\epsilon_i,$

where $$\epsilon_i\stackrel{iid}{\sim}N(0,\sigma^2)$$, $$i=1,\dots,n$$. Which of the following statements is NOT true? ( )

A. The estimators $$\hat\beta_0,\hat\beta_1,\hat\beta_2$$ are normally distributed.

B. The estimators $$\hat\beta_0,\hat\beta_1,\hat\beta_2$$ are independent.

C. $$\mathbb{E}[\hat\beta_i] = \beta_i,\ i=0,1,2$$.

D. $$\hat\beta_0-\hat\beta_1$$ is normally distributed.

1. Consider the multiple linear model $$Y = X\beta +\epsilon$$, where $$X$$ is the $$n\times p$$ design matrix, $$\beta$$ is a vector of $$p$$ parameters, and the error $$\epsilon\sim N(0,\sigma^2 I_n)$$. Let $$\hat\beta$$ be the least squares estimate of $$\beta$$, $$\hat Y = X\hat\beta$$, $$\hat\epsilon = Y-\hat Y$$, and $$S_e^2 = ||\hat\epsilon||^2=\sum_{i=1}^n \hat\epsilon_i^2$$. Which of following statements is true? ( )

A. $$\frac{S_e^2}{\sigma^2}\sim \chi^2(n)$$

B. $$S_e^2$$ is independent of the length of $$\hat\beta$$, i.e., $$||\hat\beta||$$.

C. $$\hat\beta\sim N(\beta,\sigma^2 X^\top X)$$

D. $$\sqrt{S_e^2/(n-p)}$$ is an unbiased estimate of $$\sigma$$.

Part III. (15 points)

Let $$X_1,\dots,X_n$$ be i.i.d. sample of $$X\sim N(\mu,\sigma^2)$$, where $$\mu\in\mathbb{R}$$ and $$\sigma>0$$.

1. If $$\sigma$$ is known, find a $$1-\alpha$$ confidence interval (CI) for $$\mu$$.

2. If $$\sigma$$ is unknown, find a $$1-\alpha$$ CI for $$\mu$$.

3. Would you use the CI established in Part (b) if you were able to get the value of $$\sigma$$? Why?

Part IV. (10 points)

For a random sample of size $$n$$ from a population $$X$$, consider the following as an estimate of $$\theta=\mathbb{E}[X]$$:

$\hat\theta = \sum_{i=1}^n c_i X_i,$

where $$c_i$$ are fixed numbers and $$X_1,\dots,X_n$$ is i.i.d. sample.

1. Find a condition on the $$c_i$$ such that the estimate is unbiased.

2. Show that the choice of $$c_i$$ that minimizes the mean squared errors (MSEs) of the estimate subject to the condition in Part (a) is $$c_i = 1/n$$, where $$i=1,\dots,n$$.

Part V. (10 points)

Suppose that $$X$$ is a discrete random variable with $P(X=1) = (1-\theta)^2,\ P(X=2) = 2\theta(1-\theta),\ P(X=3)=\theta^2,$ where $$\theta\in(0,1)$$. Now a total of $$100$$ independent observations of $$X$$ are made with the following frequencies:

Case $$X=1$$ $$X=2$$ $$X=3$$
Frequency 70 10 20

What is the maximum likelihood estimate of $$\theta$$?

Part VI. (10 points)

Write down the Neyman-Pearson (N-P) Lemma and prove it.

Part VII. (10 points)

There are 37 blood alcohol determinations made by Analyzer GTE-10, a three-year-old unit that may be in need of recalibration （校准）. All 37 measurements were made using a test sample on which a properly adjusted machine would give a reading of $$12.6\%$$. Based on the data, the sample mean $$\bar x = 12.7\%$$ and the sample standard deviation $$s = 0.6\%$$. ($$t_{0.975}(36)=2.028, t_{0.975}(37)=2.026, t_{0.95}(36)=1.688, t_{0.95}(37)=1.687, u_{0.975}=1.960, u_{0.95}=1.645$$)

1. Would you recommend that the machine should be readjusted （重新调整） at the level of significance $$\alpha = 0.05$$?

2. What is the p-value of your test? (Suppose that the CDFs of the standard normal, t, $$\chi^2$$, F distributions are known. You can use them whenever you need.)

3. What assumptions are made when conducting the test?

Part VIII. (15 points)

Suppose that in the model

$y_i= \beta_0+\beta_1x_i+\epsilon_i,\ i=1,\dots,n,$

the errors $$\epsilon_i$$ have mean zero and are uncorrelated, but $$\mathrm{Var}(\epsilon_i) = \rho_i^2\sigma^2$$, where the $$\rho_i>0$$ are known constants, so the errors do not have equal vairance. Because the variances are not equal, the theory developed in our class does not apply.

1. Try to transform suitably the model such that the basic assumptions (i.e., the errors have zero mean and equal variance, and are uncorrelated) of the standard statistical model are satisfied.

2. Find the least squares estimates of $$\beta_0$$ and $$\beta_1$$ for the transformed model.

3. Find the variances of the estimates of Part (b).