Exercises
Exercise 2.1 (Normal additive property) Prove the additive property of the normal distribution, that is, prove that if \(X_i,\) \(i=1,\ldots,n\) are independent rv’s with respective distributions \(\mathcal{N}(\mu_i,\sigma_i^2),\) \(i=1,\ldots,n,\) then
\[\begin{align*} \sum_{i=1}^n X_i\sim \mathcal{N}\left(\sum_{i=1}^n\mu_i,\sum_{i=1}^n\sigma_i^2\right). \end{align*}\]
You can use that \(M_{\mathcal{N}(\mu,\sigma^2)}(s)=e^{s \mu+\frac{1}{2} \sigma^2 s^2}.\)
Exercise 2.2 (Sampling distribution of the maximum) Let \((X_1,\ldots,X_n)\) be a srs of a rv \(X\) with cdf \(F_X.\) Prove that the sampling distribution of the statistic \(T(X_1,\ldots,X_n)=X_{(n)}\) is \(\left[F_X(t)\right]^n.\)
Exercise 2.3 Consider a srs \((X_1,\ldots,X_n)\) of \(\mathcal{U}(-\theta,\theta),\) \(\theta>0.\) Derive the pdf of
Then, plot the pdf’s for \(\theta=2\) and \(n=2,5,10,30.\) What do you observe?
Exercise 2.4 Define the distribution \(\mathrm{Cauchy}(\mu,\sigma)\) as
\[\begin{align*} \mathrm{Cauchy}(\mu,\sigma)\stackrel{d}{=}\mu+\sigma t_1, \end{align*}\]
where \(t_1\) is the Student’s \(t\) distribution (Definition 2.4) with one degree of freedom, \(\mu\in\mathbb{R}\) is the location, and \(\sigma\in\mathbb{R}_+\) is the scale (note these parameters are not the mean and standard deviation).
- Show that the pdf of \(\mathrm{Cauchy}(\mu,\sigma)\) is \[ f(x;\mu,\sigma)=\frac{1}{\pi \sigma\left[1+\left(\frac{x-\mu}{\sigma}\right)^2\right]}, \quad x\in\mathbb{R}. \]
- Show that the cdf of \(\mathrm{Cauchy}(\mu,\sigma)\) is \[ F(x;\mu,\sigma)=\frac{1}{2}+\frac{1}{\pi} \arctan \left(\frac{x-\mu}{\sigma}\right), \quad x\in\mathbb{R}. \]
Exercise 2.5 Consider a srs \((X_1,\ldots,X_n)\) of \(\mathrm{Cauchy}(\mu,\sigma).\) Derive the pdf of
Then, plot the pdf’s for \(\mu=0,\) \(\sigma=1,2,\) and \(n=2,5,10,30.\) What do you observe?
Exercise 2.6 Consider a srs \((X_1,\ldots,X_n)\) of \(\mathrm{Pois}(\lambda).\) Compute the cdf and the pmf of
Then, plot the pmf’s for \(\lambda=1\) and \(n=2,5,10,30.\) What do you observe?
Exercise 2.7 Let \(X\) be the rv that describes the number of days a patient is in an intensive care unit after an operation. It is known that the distribution of \(X\) is
\(r\) | \(1\) | \(2\) | \(3\) |
---|---|---|---|
\(\mathbb{P}(X=r)\) | \(0.3\) | \(0.4\) | \(0.3\) |
Find:
- The mean of the population.
- The standard deviation of the population.
- Let \(X_1\) and \(X_2\) be a srs of two patients. Find the distribution of the sample mean from the joint distribution of \(X_1\) and \(X_2,\)
Exercise 2.8 The monthly savings (in euros) of a student is a normal rv with mean \(\mu=100\) and standard deviation \(\sigma=50.\) Sixteen students were selected at random, with \(\bar{X}\) being the sample mean of the measured savings.
- What is the distribution of \(\bar{X}\)?
- Compute the probability that \(\bar{X}\) is larger than \(125.\)
- Compute the probability that \(\bar{X}\) is between \(90\) and \(130.\)
Exercise 2.9 Several government posts believe that a salary increment (in percentage) of the employees in the banking sector follows a normal distribution with standard deviation \(3.37.\) A sample of \(n=16\) employees from the sector is taken. Find the probability that the sample standard deviation is:
- Smaller than \(1.99;\)
- Larger than \(2.89.\)
Exercise 2.10 Assuming that the births of boys and girls are equally likely, find the probability that in the next \(200\) births:
- Less than \(40\%\) of them are boys;
- Between \(43\%\) and \(57\%\) are girls;
- More than \(54\%\) are boys.
Exercise 2.11 A tobacco manufacturer company claims that the mean nicotine content in their cigarettes is \(\mu=0.6\) mg. per cigarette. The nicotine content is assumed to be a \(\mathcal{N}(\mu,\sigma^2)\) rv. An independent organization measures the nicotine content of a sample of \(n=16\) of their cigarettes and finds that the average nicotine content in that batch is \(\bar{X}=0.72\) and that the quasistandard deviation is \(S'=0.1.\) What was the probability of observing \(|\bar{X}-\mu| \geq 0.12\) if \(\mu\) is actually equal to \(0.6,\) as the company claims? What can you hint from this probability?
Exercise 2.12 The daily expenses in heating of two similar-sized company departments follows a normal rv with an average expense of \(10\) euros for both departments, and a standard deviation of \(1\) for the first and \(1.5\) for the second. In order to audit the expenses, the expenses are measured at both departments for \(10\) days chosen at random. Compute:
- The probability that in the \(10\) days, the average expense of the first department is above the average expense of the second by at least \(10\) euros.
- The probability that the sample variance of the first department is smaller than two times the sample variance of the second.
Exercise 2.13 The lifetime of certain electronic components follows a normal distribution with mean \(1600\) hours and standard deviation \(400\) hours.
- Given a srs of \(16\) components, find the probability that \(\bar{X}\geq 1500\) hours.
- Given a srs of \(16\) components, what is the number of hours \(h\) such that the probability that \(\bar{X}\geq h\) is \(0.15.\)
- Given a srs of \(16\) components, what is the number of hours \(h\) such that the probability that \(S'\geq h\) is \(0.10\) .
- Given a srs of \(121\) components, find the probability that at least half of the sample components have a lifetime longer than \(1500\) hours.
- Find the number of components for a sample that is required for ensuring that, with probability \(0.92,\) the average lifetime of the sample is larger than \(1500\) hours.
Exercise 2.14 Given the srs of size \(10\) from a normal distribution with standard deviation \(2,\) compute the probability that the sample and the population means differ in more than \(0.5\) units. Compute the size of the sample required for ensuring that, with probability \(0.9,\) the sample and the population means differ in less than \(0.1\) units.
Exercise 2.15 The effectiveness (measured in days) of a certain drug is distributed as \(\mathcal{N}(14,\sigma^2).\) The drug is given to \(16\) patients and the observed quasistandard deviation in the sample is \(1.4\) days. The minimum average effectiveness required for its commercialization is \(13\) days. Determine:
- The probability that the average effectiveness does not attain the required minimum.
- The probability that variance is underestimated more than a \(20\%.\)
- Does the previous probability increase or decrease with the sample size?
- The sample size such that the probability in part b is \(0.05.\)
- A reason of why there is so much concern about variance estimation.
Exercise 2.16 The bearing balls of a given manufacturer weigh \(0.5\) grams on average and have a standard deviation of \(0.02\) grams. Find the probability that two batches of \(1000\) balls differ by weight more than \(2\) grams.
Exercise 2.17 A factory produces a certain chemical product, whose amount of impurities has to be controlled. For that aim, \(20\) batches of the product are examined. If the standard deviation of the percentage of impurities is above \(2.5\%,\) then the production chain will have to be carefully examined. It is assumed that the percentage of impurities is normally distributed.
- What is the probability that the production chain will have to be examined if the population standard deviation is \(2\%\)?
- What is the probability that the average percentage of impurities in the sample is above \(5\%\) if the average population percentage is \(1\%\)?
Exercise 2.18 Let \((X_1,X_2)\) be a srs of a \(\mathrm{Exp}(\lambda)\) rv. What is the distribution of the ratio statistic \(X_1/X_2\)?
Exercise 2.19 Let \((X_1,\ldots,X_n)\) be a srs of a rv distributed as \(\mathcal{N}(\mu,\sigma^2).\) Show that:
\[\begin{align*} \mathbb{E}[S^2]&=\frac{n-1}{n}\sigma^2, & \mathbb{V}\mathrm{ar}[S^2]&=\frac{2(n-1)}{n^2}\sigma^4,\\ \mathbb{E}[S'^2]&=\sigma^2, & \mathbb{V}\mathrm{ar}[S'^2]&=\frac{2}{n-1}\sigma^4. \end{align*}\]
Exercise 2.20 An environmental protection agency is interested in establishing norms for the amount of permissible chemical products in lakes and rivers. A commonly employed toxicity metric is the quantity of any pollutant that will kill half of the test specimens in a given time interval (usually \(96\) hours for fish). This metric is denoted as LC50 (Lethal Concentration that kills the \(50\%\) of the test specimens). It has been observed in previous studies that \(\log(\mathrm{LC50})\) follows a normal distribution. Let \(S_1'^2\) be the sample quasivariance of a srs of \(10\) values of \(\log(\mathrm{LC50})\) for copper and let \(S_2'^2\) be the sample quasivariance of a srs of \(8\) values of \(\log(\mathrm{LC50})\) for lead. Both samples were obtained from the same fish species. Assume that the population variance for the copper measurements is twice the one for lead. Assuming that \(S_1'^2\) and \(S_2'^2\) are independent, find two numbers \(a\) and \(b\) such that
\[\begin{align*} \mathbb{P}\left(a\leq \frac{S_1'^2}{S_2'^2}\leq b\right)=0.90. \end{align*}\]
Exercise 2.21 Fifty people have simulated samples of size three from a rv \(\mathcal{N}(\mu, \sigma^2).\) Each of the samples gave the value of the statistic
\[\begin{align*} \hat{\mu}=\frac{X_1+3X_2-X_3}{5}, \end{align*}\]
that is going to be used as an estimator of \(\mu.\) The fifty values of \(\hat{\mu}\) are represented in a histogram, and it turns out that the normal distribution that better fits the data has mean \(-1.68\) and standard deviation \(1.59.\) With this information, is it possible to estimate the distribution \(\mathcal{N}(\mu, \sigma^2)\) from which the data came from?
Exercise 2.22 Let \(X\sim \mathcal{N}(0,1)\) and \(Y\sim \chi_{\nu}^2\) be two independent rv’s. Show that the density of the rv defined as
\[\begin{align*} T=\frac{X}{\sqrt{Y/\nu}} \end{align*}\]
is
\[\begin{align*} f_T(t)=\frac{\Gamma\left(\frac{\nu+1}{2}\right)}{\sqrt{\nu \pi}\,\Gamma\left(\frac{\nu}{2}\right)}\left(1+\frac{t^2}{\nu}\right)^{-(\nu+1)/2},\quad t\in \mathbb{R}. \end{align*}\]
Exercise 2.23 Using R, verify by simulations that the following equality of distributions are true:
- \(\mathcal{F}_{1,\nu}\stackrel{d}{=}t_{\nu}^2,\) for all \(\nu>0.\)
- \(1/\mathcal{F}_{\nu_1,\nu_2}\stackrel{d}{=}\mathcal{F}_{\nu_2,\nu_1},\) for all \(\nu_1,\nu_2>0.\)
To do so, simulate \(M=10^4\) values from the Snedecor’s \(\mathcal{F}\) distribution using rf()
and from the Student’s \(t\) distribution using rt()
and compare the histograms of the simulated values for several degrees of freedom.
Exercise 2.24 Using R, verify by simulations that Fisher’s Theorem claim on the independence between \(\bar{X}\) and \(S^2\) on a normal population is true. To do so, simulate \(M=10^4\) values of \((\bar{X},S^2)\) for \(n=10\) and the choices of \(\mu\) and \(\sigma\) that you want. Then, inspect if there is dependence on the sample \(\{(\bar{X}_i,S_i^2)\}_{i=1}^M.\) Use rnorm()
for simulating values from a normal distribution.
Exercise 2.25 Let \(X\sim F\) be a continuous random variable with a monotone non-decreasing cdf \(F\) and pdf \(f.\) Let \(U\sim \mathcal{U}(0,1).\) Using transformations, show that:
- \(F(X)\sim\mathcal{U}(0,1).\)
- \(F^{-1}(U)\sim F.\)
The second result is very useful for simulation matters: a srs from \(F\) can be simulated with a srs \((U_1,\ldots,U_n)\) from \(\mathcal{U}(0,1)\) to which \(F^{-1}\) is applied. That is, \((F^{-1}(U_1),\ldots,F^{-1}(U_n))\) is a srs from \(F.\) Derive \(F^{-1}\) for
- \(\mathrm{Exp}(\lambda)\) and
- \(\mathrm{Cauchy}(\mu,\sigma),\)
and code in R functions to simulate samples for each distribution. Check that the simulated samples follow the desired distributions by comparing the histograms of the samples with the theoretical pdf’s.