Chapter 8 Solutions

Chapter 1

Q1-1: Introductory Questions

  1. What is a model?
  2. What is a statistical model?
  3. What is an advanced statistical model?
  4. What is a variable?
  5. What is a random variable?

Answer

  1. Quite broad. The Cambridge dictionary lists the following definitions of model (noun)
  • something that a copy can be based on because it is an extremely good example of its type.
  • a person who wears clothes so that they can be photographed or shown to possible buyers, or a person who is employed to be photographed or painted.
  • a physical object, usually smaller than the real object, that is used to represent something.
  • a simple description of a system or process that can be used in calculations or predictions of what might happen.
  1. A statistical model is a set (also known as a family) of (probability) distributions.

  2. Advanced is clearly a subjective word here…

  • Can a model be advanced? If so, what does that mean?
  • Perhaps it is the modelling that is advanced? Requiring more in-depth, expert knowledge, for example.
  • Let’s look at the Aim of the Course: To provide advanced methodological and practical knowledge in the field of statistical modelling, covering a wide range of modelling techniques which are essential for the professional statistician.
  1. and e) A variable in statistics is the same as a variable in mathematics more generally: an unspecified element of a given set whose elements are the possible values of the variable. The difference is that in statistics we place a probability distribution on this space of values. A variable with a probability distribution on its space of values is sometimes called a random variable (although the usual mathematical definition is not in these terms). However, I believe this terminology can be redundant and confusing, and would exercise caution using this terminology, for the following two reasons:
  • Firstly, nothing has happened to the variable. Rather, an extra structure (the probability distribution) has been added to the space of values, and it is this structure that we must focus on for statistics. As a result, just as in mathematics more generally, we can, and will, get away with not distinguishing between a variable and its value, using \(x\), for example, for both. Nevertheless, just as in mathematics more generally, such a distinction is sometimes useful. If it is useful, then a variable will be denoted in upper case: the usual notation \(x\) will be replaced by the notation \(X = x\), both of them indicating that the variable \(X\) has the value \(x\).

  • Secondly, the terminology encourages the notion that there are two types of quantity in the world: “random” quantities, to be represented by “random” variables; and “non-random” quantities, to be represented by “ordinary” variables. With the possible exception of quantum mechanics, which is irrelevant to our purposes, this is false, or, rather, meaningless: try thinking of an empirical test to decide whether a quantity is “random” or “non-random”; there are only quantities, represented by variables. The purpose of probability distributions is to describe our knowledge of the values of these variables.

Q1-2: Types of Variable

Decide whether the following variables are categorical or numerical, and classify further if possible:

  1. gender of the next lamb born at the local farm.
  • Answer: categorical, unranked, binary.
  1. number of times a dice needs to be thrown until the first \(6\) is observed.
  • Answer: numerical, discrete.
  1. amount of fluid (in ounces) dispensed by a machine used to fill bottles with lemonade.
  • Answer: numerical, continuous.
  1. thickness of penny coins in millimetres.
  • Answer: numerical, continuous.
  1. assignment grades of a 3H Mathematics course (from A to D).
  • Answer: categorical, ranked.
  1. marital status of some random sample of citizens.
  • Answer: categorical, unranked.

Q1-3: Properties of Probability Distributions

  1. Prove the results for the expectation and covariance structure of the multinomial distribution stated in Section 1.4.3.1.

Answer: Consider first a Multinoulli random variable \(\boldsymbol{W}\), that is, a random variable following a Multinomial distribution with \(n = 1\), parameterised by probability vector \(\boldsymbol{\pi} = (\pi_1,...,\pi_k)\). Note that \(\boldsymbol{W}\) is a vector of \(k-1\) \(0\)’s and a single \(1\), with the \(W_j\) taking value \(1\) with probability \(\pi_j\).

Almost by definition, we have that \({\mathrm E}[\boldsymbol{W}] = \boldsymbol{\pi}\).

Now, for \(j = 1,...,k\), we have \[\begin{eqnarray} {\mathrm{Var}}[W_j] & = & {\mathrm E}[W_j^2] - ({\mathrm E}[W_j])^2 \\ & = & {\mathrm E}[W_j] - ({\mathrm E}[W_j])^2 \\ & = & \pi_j(1 - \pi_j) \end{eqnarray}\] For \(j \neq j'\): \[\begin{eqnarray} {\mathrm{Cov}}[W_j,W_{j'}] & = & {\mathrm E}[W_j W_{j'}] - {\mathrm E}[W_j] {\mathrm E}[W_{j'}] \\ & = & 0 - \pi_j \pi_{j'} \\ & = & - \pi_j \pi_{j'} \end{eqnarray}\] where we have used the fact that if \(j \neq j'\), at least one of \(W_j\) and \(W_{j'}\) is equal to \(0\). It therefore follows that \[\begin{equation} {\mathrm{Var}}[\boldsymbol{W}] = \Sigma = \textrm{diag}(\boldsymbol{\pi}) - \boldsymbol{\pi} \boldsymbol{\pi}^T \end{equation}\]

Now, \(\boldsymbol{Y} = \sum_{l=1}^n \boldsymbol{Y}_l\), where \(\boldsymbol{Y}_l, l = 1,...,n\) are i.i.d. multinoulli. Therefore \[\begin{eqnarray} {\mathrm E}[\boldsymbol{Y}] & = & {\mathrm E}[\sum_{l=1}^n \boldsymbol{Y}_l] \\ & = & \sum_{l=1}^n {\mathrm E}[ \boldsymbol{Y}_l ] \\ & = & n \boldsymbol{\pi} \end{eqnarray}\] with the third line from the identically distributed nature of \(\boldsymbol{Y}_l\).

Also, \[\begin{eqnarray} {\mathrm{Var}}[\boldsymbol{Y}] & = & {\mathrm{Var}}[\sum_{l=1}^n \boldsymbol{Y}_l] \\ & = & \sum_{l=1}^n {\mathrm{Var}}[ \boldsymbol{Y}_l ] \\ & = & n \Sigma \end{eqnarray}\] with the second line coming from the independence of \(\boldsymbol{Y}_l\) and the third line from the identically distributed nature of \(\boldsymbol{Y}_l\).

  1. Suppose \(X_1^2,...,X_m^2\) are \(m\) independent variables with chi-squared distributions \(X_i^2 \sim \chi^2(n_i)\). Show that \[\begin{equation} \sum_{i=1}^m X_i^2 = \chi^2(\sum_{i=1}^m n_i ) \end{equation}\]

Answer: This simply comes from noticing that each \(X_i^2\) is a sum of \(n_i\) squared standard normal variables, which when all summed together just becomes a bigger sum of \(\sum_{i=1}^m n_i\) squared standard normal variables. \[\begin{equation} \sum_{i=1}^m X_i^2 = \sum_{i=1}^m \sum_{j=1}^{n_i} Z_{ij}^2 = \chi^2(\sum_{i=1}^m n_i ) \end{equation}\]

  1. Suppose \(X^2\) has the distribution \(\chi^2(n)\). Prove that \[\begin{align} {\mathrm E}[X^2] & = n \\ {\mathrm{Var}}[X^2] & = 2n \end{align}\] Hint: Your solution may require you to assume or show that \({\mathrm E}[Z^4] = 3\), where \(Z \sim{\mathcal N}(0,1)\).

Answer: Note that \(X^2 = \sum_{i=1}^n Z_i^2\) with \(Z_i \sim {\mathcal N}(0,1)\). Therefore \[\begin{equation} {\mathrm E}[X^2] = \sum_{i=1}^n {\mathrm E}[Z_i^2] = n \end{equation}\] since \({\mathrm E}[Z_i^2] = {\mathrm{Var}}[Z_i] + ({\mathrm E}[Z_i])^2 = 1\).

Also, \[\begin{equation} {\mathrm{Var}}[X^2] = {\mathrm{Var}}[\sum_{i=1}^n Z_i^2] = \sum_{i=1}^n {\mathrm{Var}}[Z_i^2] = n {\mathrm{Var}}[Z_i^2] \end{equation}\] with \[\begin{equation} {\mathrm{Var}}[Z_i^2] = {\mathrm E}[Z_i^4] - ({\mathrm E}[Z_i^2])^2 = 3 - 1^2 = 2 \end{equation}\]

To show that \({\mathrm E}[Z_i^4] = 3\), we choose to show that \({\mathrm E}[Z^{n+1}] = n {\mathrm E}[Z^{n-1}]\), which clearly yields the required result. With \(\phi(z) = \frac{1}{2 \pi} e^{-z/2}\), we have \[\begin{eqnarray} {\mathrm E}[Z^{n+1}] & = & \int_{-\infty}^{\infty} z^{n+1} \phi(z) dz \\ & = & \int_{-\infty}^{\infty} z^{n} z \phi(z) dz \\ & = & - \int_{-\infty}^{\infty} z^{n} \phi'(z) dz \\ & = & -z^n \phi(z) \biggr|_{-\infty}^{\infty} + \int_{-\infty}^{\infty} n z^{n-1} \phi(z) dz \\ & = & 0 + n {\mathrm E}[Z^{n-1}] \end{eqnarray}\] where the third line comes from the fact that \(\phi'(z) = - z \phi(z)\), and the fourth line comes from integration by parts with \(u = z^n\) and \(v' = \phi'(z)\).

Chapter 2

Q2-1: Quickfire Questions

  1. What is the difference between a Poisson, Multinomial and product Multinomial sampling scheme?

Answer: See Sections 2.1.2 and 2.3.

  1. What does an odds of 1.8 mean relative to success probability \(\pi\)?

Answer: An odds of 1.8 means that the probability of success is 1.8 times greater than the probability of failure.

Q2-2: Sampling Schemes

Write down statements for the expectation and variance of a variable following a product Multinomial sampling scheme as described in Section 2.3.3.

Answer: Let \(\boldsymbol{N}_i = (N_{i1},...,N_{iJ})\) and \(\boldsymbol{\pi}_i^\star = (\pi_{1|i},...,\pi_{J|i})\) for \(i = 1,...,I\). Then \[\begin{eqnarray} {\mathrm E}[\boldsymbol{N}_i] & = & n_{i+} \boldsymbol{\pi}_i^\star \\ {\mathrm{Var}}[\boldsymbol{N}_i] & = & n_{i+} (\textrm{diag}(\boldsymbol{\pi}_i^\star) - \boldsymbol{\pi}_i^\star \boldsymbol{\pi}_i^{\star,T}) \\ {\mathrm{Cov}}[\boldsymbol{N}_i,\boldsymbol{N}_{i'}] & = & 0_M \qquad i \neq i' \end{eqnarray}\] where \(0_M\) is a matrix of zeroes.

Q2-3: Fatality of Road Traffic Accidents

Table 7.1 shows fatality results for drivers and passengers in road traffic accidents in Florida in 2015, according to whether the person was wearing a shoulder and lap belt restraint versus not using one. Find and interpret the odds ratio.

Answer: The sample odds ratio is calculated as \[\begin{equation} \frac{n_{11} n_{22}}{n_{12} n_{21}} = \frac{433 \times 554883}{8049 \times 570} = 52.37 \end{equation}\]

Thus, the odds of a road traffic accident being fatal are estimated to be62 52.37 times greater if no restraint is used relative to if one is used.

Q2-4: Difference of Proportions or Odds Ratios?

A 20-year study of British male physicians (Doll and Peto (1976)) noted that the proportion who died from lung cancer was 0.00140 per year for cigarette smokers and 0.00010 per year for non-smokers. The proportion who died from heart disease was 0.00669 for smokers and 0.00413 for non-smokers.

  1. Describe and compare the association of smoking with lung cancer and with heart disease using the difference of proportions.

  2. Describe and compare the association of smoking with lung cancer and heart disease using the odds ratio.

  3. Which response (lung cancer or heart disease) is more strongly related to cigarette smoking, in terms of increased proportional risk to the individual?

  4. Which response (lung cancer or heart disease) is more strongly related to cigarette smoking, in terms of the reduction in deaths that could occur with an absence of smoking?

Answer:

  1. Difference of proportions:
  • Lung cancer: 0.0014 - 0.0001 = 0.0013;

  • Heart disease: 0.00669 - 0.00413 = 0.00256.

Using the difference of proportions, the data suggests that cigarette smoking has a bigger impact on heart disease.

  1. Odds ratio:
  • Lung cancer (L): The sample odds for smokers (S) is given by \(\omega_{L,S} = 0.0014/0.0086\), and the sample odds for non-smokers (N) is \(\omega_{L,N} =0.0001/0.9999\). The sample odds ratio is therefore \(\omega_{L,S}/\omega_{L,N} = 14.02\).

  • Heart disease (H): The sample odds for smokers (S) is given by \(\omega_{H,S} = 0.00669/0.99331\), and the sample odds for non-smokers (N) is \(\omega_{H,N} =0.00413/0.99587\). The sample odds ratio is therefore \(\omega_{H,S}/\omega_{H,N} = 1.624\).

The odds of dying from lung cancer are estimated to be 14.02 times higher for smokers than for non-smokers, whilst the odds of dying from heart disease are estimated to be 1.624 times higher for smokers than non-smokers. Thus, using the sample odds ratio, the data suggests that cigarette smoking has a bigger impact on lung cancer.

  1. For measure based on increased proportional risk, we use the sample odds ratios above; lung cancer has an odds ratio of 14.02 compared to heart disease with an odds ratio of 1.624. Therefore, increased proportional risk to the individual smoker is much higher for lung cancer than heart disease relative to the corresponding risks for a non-smoker.

  2. The difference of proportions describes excess deaths due to smoking. That is, if \(N =\) number of smokers in population, we predict there would be \(0.00130N\) fewer deaths per year from lung cancer if they had never smoked, and \(0.00256N\) fewer deaths per year from heart disease. Thus (based on this study), elimination of cigarette smoking is predicted to have the biggest impact on deaths due to heart disease.

Q2-5: Asymptotic Distribution of X^2

  1. Show that Equation (2.6) in Section 2.4.2 holds.

Answer: To verify Equation (2.6), let \[\begin{equation} W = \Sigma^\star \, (\Sigma^\star)^{-1} \end{equation}\] Then we can verify that \[\begin{eqnarray} W_{jj} & = & \pi_j(1-\pi_j) \left( \frac{1}{\pi_j} + \frac{1}{\pi_k} \right) - \sum_{i = 1, i \neq j}^{k-1} \frac{\pi_i \pi_j}{\pi_k} \\ & = & 1 - \pi_j + \frac{\pi_j}{\pi_k} (1 - \pi_j - \sum_{i = 1, i \neq j}^{k-1} \pi_i) \\ & = & 1 - \pi_j + \frac{\pi_j}{\pi_k} \pi_k \\ & = & 1 \\ W_{jl} & = & \pi_j(1-\pi_j) \frac{1}{\pi_k} - \pi_j \pi_l \left( \frac{1}{\pi_l} + \frac{1}{\pi_k} \right) - \frac{1}{\pi_k} \sum_{i = 1, i \neq j}^{k-1} \pi_i \pi_j \\ & = & \frac{\pi_j}{\pi_k} ( 1 - \pi_j - \pi_l - \sum_{i = 1, i \neq j}^{k-1} \pi_i) - \pi_j \frac{\pi_l}{\pi_l} \\ & = & \frac{\pi_j}{\pi_k} \pi_k - \pi_j \\ & = & 0 \end{eqnarray}\]

  1. Show that Equation (2.7) in Section 2.4.2 holds.

Answer: To verify Equation (2.7) we have that \[\begin{eqnarray} m(\bar{\boldsymbol{Y}} - \pi^\star)^T (\Sigma^\star)^{-1} (\bar{\boldsymbol{Y}} - \pi^\star) & = & m \frac{1}{\pi_k} \sum_{i,j = 1}^{k-1} (\bar{X}_i - \pi_i)(\bar{X}_j - \pi_j) \\ && \,\,\, + \,\, m \sum_{i=1}^{k-1} \frac{1}{\pi_i} (\bar{X}_i - \pi_i)^2 \\ \end{eqnarray}\] where we have that \[\begin{eqnarray} \sum_{i,j = 1}^{k-1} (\bar{X}_i - \pi_i)(\bar{X}_j - \pi_j) & = & \sum_{i = 1}^{k-1} \left( (\bar{X}_i - \pi_i) \sum_{j = 1}^{k-1} (\bar{X}_j - \pi_j) \right) \\ & = & - (\bar{X}_k - \pi_k) \sum_{i = 1}^{k-1} (\bar{X}_i - \pi_i) \\ & = & (\bar{X}_k - \pi_k)^2 \end{eqnarray}\] thus verifying the result.

Q2-6: Maximum Likelihood by Lagrange Multipliers

  1. Consider a Multinomial sampling scheme, and that \(X\) and \(Y\) are independent. We need to find the MLE of \(\boldsymbol{\pi}\), but now we have that \[\begin{equation} \pi_{ij} = \pi_{i+} \pi_{+j} \end{equation}\] The log likelihood is \[\begin{eqnarray} l(\boldsymbol{\pi}) & \propto & \sum_{i,j} n_{ij} \log (\pi_{ij}) \\ & = & \sum_{i} n_{i+} \log (\pi_{i+}) + \sum_{j} n_{+j} \log (\pi_{+j}) \end{eqnarray}\] Use the method of Lagrange multipliers to show that \[\begin{eqnarray} \hat{\pi}_{i+} & = & \frac{n_{i+}}{n_{++}} \\ \hat{\pi}_{+j} & = & \frac{n_{+j}}{n_{++}} \end{eqnarray}\]

Answer: The Lagrange function is \[\begin{eqnarray} \mathcal{L}(\boldsymbol{\pi}, \boldsymbol{\lambda}) & = & l(\boldsymbol{\pi}) - \lambda_1 \bigl( \sum_{i} \pi_{i+} - 1 \bigr) - \lambda_2 \bigl( \sum_{j} \pi_{+j} - 1 \bigr) \\ & = & \sum_{i} n_{i+} \log (\pi_{i+}) + \sum_{j} n_{+j} \log (\pi_{+j}) - \lambda_1 \bigl( \sum_{i} \pi_{i+} - 1 \bigr) - \lambda_2 \bigl( \sum_{j} \pi_{+j} - 1 \bigr) \nonumber \end{eqnarray}\]

Local optima \(\hat{\boldsymbol{\pi}}, \hat{\boldsymbol{\lambda}}\) will satisfy: \[\begin{eqnarray} \frac{\partial \mathcal{L}(\hat{\boldsymbol{\pi}}, \hat{\boldsymbol{\lambda}})}{\partial \pi_{i+}} & = & 0 \qquad i = 1,...,I \\ \frac{\partial \mathcal{L}(\hat{\boldsymbol{\pi}}, \hat{\boldsymbol{\lambda}})}{\partial \pi_{+j}} & = & 0 \qquad j = 1,...,J \\ \frac{\partial \mathcal{L}(\hat{\boldsymbol{\pi}}, \hat{\boldsymbol{\lambda}})}{\partial \lambda_1} & = & 0 \\ \frac{\partial \mathcal{L}(\hat{\boldsymbol{\pi}}, \hat{\boldsymbol{\lambda}})}{\partial \lambda_2} & = & 0 \end{eqnarray}\] which in this case means satisfy: \[\begin{eqnarray} \frac{n_{i+}}{\pi_{i+}} - \lambda_1 & = & 0 \qquad i = 1,...,I \\ \frac{n_{+j}}{\pi_{+j}} - \lambda_2 & = & 0 \qquad j = 1,...,J \\ \sum_i \pi_{i+} & = & 1 \\ \sum_j \pi_{+j} & = & 1 \end{eqnarray}\] and hence \[\begin{eqnarray} n_{i+} & = & \hat{\lambda_1} \hat{\pi}_{i+} \\ \implies \sum_{i} n_{i+} & = & \hat{\lambda}_1 \sum_{i} \hat{\pi}_{i+} \\ \implies \hat{\lambda}_1 & = & n_{++} \end{eqnarray}\] and similarly that \[\begin{equation} \hat{\lambda}_2 = n_{++} \end{equation}\]

Thus \[\begin{eqnarray} \hat{\pi}_{i+} & = & \frac{n_{i+}}{n_{++}} \\ \hat{\pi}_{+j} & = & \frac{n_{+j}}{n_{++}} \end{eqnarray}\]

  1. Using the method of Lagrange multipliers, show that Equation (2.13) of Section 2.4.4.1.1 holds.

Answer:

\[\begin{eqnarray} l(\boldsymbol{\pi}) & = & \sum_{i,j} n_{ij} \log (\pi_{j|i}) \\ & = & \sum_{i,j} n_{ij} \log (\pi_{+j}) \\ & = & \sum_{j} n_{+j} \log (\pi_{+j}) \end{eqnarray}\]

Since \(\sum_j \pi_{+j} = 1\), the Lagrange function is \[\begin{eqnarray} L(\boldsymbol{\pi}, \lambda) & = & l(\boldsymbol{\pi}) - \lambda(\sum_j \pi_{+j} - 1) \\ & = & \sum_{j} n_{+j} \log (\pi_{+j}) - \lambda(\sum_j \pi_{+j} - 1) \end{eqnarray}\]

Local optima \(\hat{\boldsymbol{\pi}}, \hat{\lambda}\) will satisfy: \[\begin{eqnarray} \frac{\partial \mathcal{L}(\hat{\boldsymbol{\pi}}, \hat{\lambda})}{\partial \pi_{+j}} & = & 0 \qquad j = 1,...,J \\ \frac{\partial \mathcal{L}(\hat{\boldsymbol{\pi}}, \hat{\lambda})}{\partial \lambda} & = & 0 \end{eqnarray}\] which implies \[\begin{eqnarray} \frac{n_{+j}}{\hat{\pi}_{+j}} - \hat{\lambda} & = & 0 \qquad j = 1,...,J \\ \sum_{j} \hat{\pi}_{+j} - 1 & = & 0 \end{eqnarray}\] and hence \[\begin{eqnarray} \sum_{j=1}^J n_{+j} & = & \hat{\lambda} \\ \implies \hat{\lambda} & = & n_{++} \end{eqnarray}\] and thus \[\begin{equation} \hat{\pi}_{+j} = \frac{n_{+j}}{n_{++}} \end{equation}\]

Q2-7: Second-Order Taylor Expansion

Show that Approximation (2.15) of Section 2.4.5.1 holds.

Answer: Let \(n_{ij} = \hat{E}_{ij} + \delta_{ij}\), and note that a second-order Taylor expansion of \(\log(1+x)\) about \(1\) is given by \(\log(1+x) \approx x - \frac{1}{2} x^2\).

Then: \[\begin{eqnarray*} G^2 & = & 2 \sum_{i,j} n_{ij} \log \bigl( \frac{n_{ij}}{\hat{E}_{ij}} \bigr) \\ & = & 2 \sum_{i,j} (\hat{E}_{ij} + \delta_{ij}) \log \bigl( 1 + \frac{\delta_{ij}}{\hat{E}_{ij}} \bigr) \\ & \approx & 2 \sum_{i,j} (\hat{E}_{ij} + \delta_{ij}) \bigl( \frac{\delta_{ij}}{\hat{E}_{ij}} - \frac{\delta_{ij}^2}{2\hat{E}_{ij}^2} \bigr) \\ & \approx & 2 \sum_{i,j} \bigl( \delta_{ij} + \frac{\delta_{ij}^2}{2\hat{E}_{ij}} ) \end{eqnarray*}\] up to second-order. Since \(\delta_{ij} = n_{ij} - \hat{E}_{ij}\) and \(\sum \delta_{ij} = 0\), we have that \[\begin{equation} G^2 \approx \sum_{i,j} \frac{(n_{ij} - \hat{E}_{ij})^2}{\hat{E}_{ij}} = X^2 \end{equation}\] as required.

Q2-8: Relative Risk

  1. In 1998, A British study reported that “Female smokers were 1.7 times more vulnerable than Male smokers to get lung cancer.” We don’t investigate whether this is true or not, but is 1.7 the odds ratio or the relative risk63? Briefly (one sentence maximum) explain your answer.

Answer: I would say that 1.7 is the relative risk, as it seems that the statement is claiming that the probability of one event happening is 1.7 times greater than the probability of the other. This by assuming that the definition of vulnerability is with regard to the absolute probability of the event occurring (for one group to another) rather than a ratio of the odds of it occurring. Having said this, there is vaguery in the wording - there is potential for it to be interpreted differently. Such vaguery in this one line alone highlights how we need to be careful to be precise in the phrasing of our results, so as not to confuse, or (deliberately…?) mislead people.

  1. A National Cancer institute study about tamoxifen and breast cancer reported that the women taking the drug were \(45\%\) less likely to experience invasive breast cancer than were women taking a placebo. Find the relative risk for (i) those taking the drug compared with those taking the placebo, and (ii) those taking the placebo compared with those taking the drug.

Answer: i) Defining \(\pi_D\) and \(\pi_P\) to be the probabilities of invasive breast cancer given the drug and the placebo respectively, we have that \[\begin{equation} \pi_D = (1-0.45) \pi_P \implies \pi_D = 0.55 \pi_P \implies \pi_D/\pi_P = 0.55 \end{equation}\]

  1. From part (i), we have that \[\begin{equation} \pi_P/\pi_D = 1/0.55 = 1.82 \end{equation}\]

Q2-9: The Titanic

For adults who sailed on the Titanic on its fateful voyage, the odds ratio between gender (categorised as Female (F) or Male (M)), and survival (categorised as yes (Y) or no (N)) was 11.4 (Dawson (1995)).

  1. It is claimed that “The Probability of survival for women was 11.4 times that for men”. i) What is wrong with this interpretation? ii) What should the correct interpretation be? iii) When would the quoted interpretation be approximately correct?

Answer: i) and ii) The probability of survival for women was not 11.4 times that for men. The (sample64) odds of survival for women (not accounting for other factors) was 11.4 times greater than for men.

  1. Let \(\pi_{YF}, \pi_{NF}, \pi_{YM}, \pi_{NM}\) be the probabilities of survival or not for women and men respectively. The quoted interpretation would be approximately correct when the probabilities of success of both events are small, hence \(\pi_{NF} \approx 1\) and \(\pi_{NM} \approx 1\) so that: \[\begin{equation} \frac{\pi_{YF}/\pi_{NF}}{\pi_{YM}/\pi_{NM}} \approx \frac{\pi_{YF}}{\pi_{YM}} \end{equation}\]
  1. The odds of survival for women was 2.9. Find the proportion of each gender who survived.

Answer: We have that \[\begin{eqnarray} && \frac{\pi_{YF}}{\pi_{NF}} = \frac{\pi_{Y|F}}{\pi_{N|F}} = 2.9 \\ & \implies & \pi_{Y|F} = 2.9(1 - \pi_{Y|F}) \\ & \implies & \pi_{Y|F} = \frac{2.9}{3.9} = \frac{29}{39} \end{eqnarray}\]

We also have that: \[\begin{eqnarray} && \frac{\pi_{Y|F}/\pi_{N|F}}{\pi_{Y|M}/\pi_{N|M}} = 11.4 \\ & \implies & \frac{\pi_{Y|M}}{\pi_{N|M}} = \frac{2.9}{11.4} = \frac{29}{114} \\ & \implies & \pi_{Y|M} = \frac{29}{114} (1 - \pi_{Y|M}) \\ & \implies & \frac{143}{114} \pi_{Y|M} = \frac{29}{114} \\ & \implies & \pi_{Y|M} = \frac{29}{143} \end{eqnarray}\]

Q2-10: Test and Reality

For a diagnostic test of a certain disease, let \(\pi_1\) denote the probability that the diagnosis is positive given that a subject has the disease, and let \(\pi_2\) denote the probability that the diagnosis is positive given that a subject does not have the disease. Let \(\tau\) denote the probability that a subject has the disease.

  1. More relevant to a patient who has received a positive diagnosis is the probability that they truly have the disease. Given that a diagnosis is positive, show that the probability that a subject has the disease (called the positive predictive value) is \[\begin{equation} \frac{\pi_1 \tau}{\pi_1 \tau + \pi_2 (1-\tau)} \end{equation}\]

Answer: As defined in the question, let \[\begin{eqnarray} \pi_1 & = & P(\textrm{positive diagnosis given presence of disease}) = P(T^+|D^+) \\ \pi_2 & = & P(\textrm{positive diagnosis given absence of disease}) = P(T^+|D^-) \\ \tau & = & P(\textrm{presence of disease}) = P(D^+) \end{eqnarray}\]

Then, using Bayes theorem, we have that \[\begin{eqnarray} P(D^+|T^+) & = & \frac{P(T^+|D^+) P(D^+)}{P(T^+)} \\ & = & \frac{P(T^+|D^+) P(D^+)}{P(T^+|D^+) P(D^+) + P(T^+|D^-) P(D^-)}\\ & = & \frac{\pi_1 \tau}{\pi_1 \tau + \pi_2 (1-\tau)} \end{eqnarray}\]

  1. Suppose that a diagnostic test for a disease has both sensitivity and specificity equal to 0.95, and that \(\tau = 0.005\). Find the probability that a subject truly has the disease given a positive diagnostic test result.

Answer: Recall that \[\begin{eqnarray} \textrm{Sensitivity:} & \, & P(T^+ | D^+) = 0.95 \\ \textrm{Specificity:} & \, & P(T^- | D^-) = 0.95 \end{eqnarray}\] and we also have that \(\tau = P(D^+) = 0.005\).

Then \[\begin{equation} P(D^+|T^+) = \frac{0.95 \times 0.005}{0.95 \times 0.005 + (1-0.95)(1-0.005)} = 0.087 \end{equation}\]

  1. Create a \(2 \times 2\) contingency table of cross-classified probabilities for presence or absence of the disease and positive or negative diagnostic test result.

Answer: See Table 8.1.

Table 8.1: Contingency table of probabilities cross-classifying presence or absence of disease against positive or negative diagnostic test.
Test: Positive Negative Sum
Disease: Presence 0.00475 0.00025 0.005
Disease: Absence 0.04975 0.94525 0.995
Sum 0.05450 0.94550 1.000
  1. Calculate the odds ratio and interpret.

Answer:

\[\begin{equation} r_{12} = \frac{0.00475 \times 0.94525}{0.00025 \times 0.04975} = 361 \end{equation}\]

The odds of a positive test result are 361 times higher for a subject for whom the disease is present than a subject for whom the disease is absent. Equivalently, the odds of presence of the disease are 361 times higher for a subject with a positive test result than a subject with a negative test result.

Q2-11: Happiness and Income

Table 7.2 shows data from a General Social Survey cross-classifying a person’s perceived happiness with their family income.

  1. Perform a \(\chi^2\) test of independence between the two variables.

Answer: Table 8.2 shows the observed data with row and column sum totals.

Table 8.2: \(n_{ij}\).
Happiness: Not too Happy Pretty Happy Very Happy Sum
Income: Above Average 21 159 110 290
Income: Average 53 372 221 646
Income: Below Average 94 249 83 426
Sum 168 780 414 1362

Table 8.3 shows the estimated cell values under independence.

Table 8.3: Estimated \(E_{ij}\) values.
Happiness: Not too Happy Pretty Happy Very Happy Sum
Income: Above Average 35.77093 166.0793 88.14978 290
Income: Average 79.68282 369.9559 196.36123 646
Income: Below Average 52.54626 243.9648 129.48899 426
Sum 168.00000 780.0000 414.00000 1362

Table 8.4 shows the \(X_{ij}^2\) values for each cell.

Table 8.4: \(X_{ij}^2\) values.
Happiness: Not too Happy Pretty Happy Very Happy
Income: Above Average 6.099373 0.3017620 5.416147
Income: Average 8.935086 0.0112936 3.091592
Income: Below Average 32.702862 0.1039235 16.690423

We thus get that \[\begin{equation} X^2 = \sum_{i,j} X_{ij}^2 = 73.352... \end{equation}\]

Comparing to a chi-square distribution with 4 degrees of freedom, we have that \[\begin{equation} P(\chi^{2,\star}_4 \geq 73.352) \leq 0.0005 \end{equation}\] hence the test provides strong evidence to reject \(\mathcal{H}_0\) that the two variables are independent.

  1. Calculate and interpret the adjusted residuals for the four corner cells of the table.

Answer: Adjusted standardised residuals are presented in Table 8.5.

Table 8.5: Adjusted standardised Residuals.
Happiness: Not too Happy Pretty Happy Very Happy
Income: Above Average -2.973173 -0.9472192 3.144277
Income: Average -4.403194 0.2242210 2.906749
Income: Below Average 7.367666 0.5948871 -5.907023

The top-left and bottom-right cell adjusted residuals provide evidence that fewer people are in those cells in the population than if the variables were independent. The top-right and bottom-left cell adjusted residuals provide evidence that more people are in those cells in the population than if the variables were independent. Although not required for the question, by calculating all residuals, we can see that there is also evidence of more people on average income that are very happy, and less people on average income that are not too happy, than if the variables were independent.

Q2-12: Tea! (Fisher’s Exact Test of Independence)

Regarding the quote (not repeated here to save space) in the corresponding problem from Fisher (1937):

  1. From the text, we know that there are 4 cups with milk added first and 4 with tea infusion added first. How many distinct orderings can these 8 cups to be tasted take, in terms of type.

Answer: There is a total of \(\frac{8!}{4!4!} = 70\) distinct orderings of these cups.

  1. Note that the lady also knows that there are four cups of each type, and must group them into two sets of four (those she thinks had milk added first, and those she thinks had tea infusion added first). Given that the lady guesses milk first three times when indeed the milk was added first, cross-classify the lady’s guesses against the truth in a \(2 \times 2\) contingency table.

Answer: Observe that we are in a sampling scenario with fixed row and column sums. Therefore we have all the information we need to display the results of this experiment in a contingency table, as given by Table 8.6.

Table 8.6: Cross-classification of guess versus truth in the tea tasting experiment of Fisher (1935).
Truth: Milk first Tea first Sum
Guess: Milk first 3 1 4
Guess: Tea first 1 3 4
Sum 4 4 8
  1. Fisher presented an exact65 solution for testing the null hypothesis \[\begin{equation} \mathcal{H}_0: r_{12} = 1 \end{equation}\] against the one-sided alternative \[\begin{equation} \mathcal{H}_1: r_{12} > 1 \end{equation}\] for contingency tables with fixed row and column sums.
    What hypothesis does \(\mathcal{H}_0\) correspond to in the context of the tea tasting test described above? Write down an expression for \(P(N_{11} = t)\) under \(\mathcal{H}_0\). Thus, perform a (Fisher’s exact) hypothesis test to test the lady’s claim that she can indeed discriminate whether the milk or tea infusion was first added to the cup.

Answer: \(\mathcal{H}_0\) corresponds to the situation where the lady is purely guessing whether milk or tea infusion was added first (with no ability to discriminate based on taste).

Under \(\mathcal{H}_0\), given \(n_{1+}\), \(n_{+1}\) and \(n_{++}\), we have that \(N_{11} \sim \mathcal{H} g (N = n_{++}, M = n_{1+}, q = n_{+1})\)66 so that: \[\begin{equation} P(N_{11} = t) = \frac{\binom{n_{1+}}{x}\binom{n_{++}-n_{1+}}{n_{+1}-x}}{\binom{n_{++}}{n_{+1}}} \qquad \max(0,n_{+1}+n_{1+}-n_{++}) \leq x \leq \min(n_{+1},n_{1+}) \end{equation}\]

A \(p\)-value is obtained by calculating the sum of the extreme probabilities, where extreme is in the direction of the alternative hypothesis. Let \(t_{obs}\) denote the observed value of \(N_{11}\), then \[\begin{eqnarray} P(N_{11} \geq t_{obs}) & = & P(N_{11} \geq 3) \\ & = & P(N_{11} = 3) + P(N_{11} = 4) \\ & = & \frac{\binom{4}{3}\binom{8-4}{4-3}}{\binom{8}{4}} + \frac{\binom{4}{4}\binom{8-4}{4-4}}{\binom{8}{4}} \\ & = & \frac{17}{70} \end{eqnarray}\]

Hence the test does not provide evidence of the lady’s ability at any standard level of significance.

  1. Suppose the lady had correctly classified all eight cups as having either milk or tea infusion added first. Would Fisher’s exact hypothesis test provide evidence of her ability now?

Answer: You may repeat the above steps of part (c), and see that: \[\begin{equation} P(N_{11} \geq 4) = P(N_{11} = 4) = \frac{1}{70} \end{equation}\] hence the test would provide evidence of the lady’s ability at the 5% level of significance, but the power is such that this test could never provide evidence at the \(1\%\) level of significance. For this, we would need more cups in the test, for example, five of each type - feel free to play around with this scenario. How many cups of each type (assuming there is an even split) would be required such that if the lady misclassifies one cup of each type, the hypothesis test still provides evidence for her ability at the \(1\%\) level of significance?

Q2-13: US Presidential Elections

Table 7.3 cross-classifies a sample of votes in the 2008 and 2012 US Presidential elections. Test the null hypothesis that vote in 2008 was independent from vote in 2012 by estimating, and finding a \(95\%\) confidence interval for, the population odds ratio.

Answer:

We wish to test the following hypotheses: \[\begin{eqnarray} \mathcal{H}_0: & \quad& r_{12} = 1 \\ \mathcal{H}_0: &\quad & r_{12} \neq 1 \end{eqnarray}\]

An estimate for this odds ratio is given by \[\begin{equation} \hat{r}_{12} = \frac{802 \times 494}{34 \times 53} = 219.8602 \end{equation}\]

A \((1-\alpha)\) confidence interval for \(\log r_{12}\) is given by \[\begin{eqnarray*} \log \hat{r} \pm z_{\alpha/2} \sqrt{\frac{1}{n_{11}} + \frac{1}{n_{21}} + \frac{1}{n_{12}} + \frac{1}{n_{22}}} & = & \log 219.8602 \pm 1.96 \sqrt{\frac{1}{802} + \frac{1}{34} + \frac{1}{53} + \frac{1}{494}} \\& = & (4.9480, 5.8380) \end{eqnarray*}\] so that a Wald confidence interval for \(\hat{r}_{12}\) is given by \[\begin{equation} (e^{4.9480}, e^{5.8380}) = (140.9, 343.1) \end{equation}\]

Since \(r_{0,12} = 1\) lies outside of this interval, the test rejects \(\mathcal{H}_0\) at the \(5\%\) level of significance.

Agresti, A. 2019. An Introduction to Categorical Data Analysis. 3rd ed. New York: Wiley.
Bilder, C. R., and T. M. Loughin. 2015. Analysis of Categorical Data with r. Boca Raton: CRC press.
Covey, L. S., A. H. Glassman, and F. Stetner. 1990. “Depression and Depressive Symptoms in Smoking Cessation.” Comprehensive Psychiatry 31 (4): 350–54.
Dawson, R. J. M. 1995. “The Unusual Episode Data Revisited.” Journal of Statistical Education 3 (3).
Doll, R., and R. Peto. 1976. “Mortality in Relation to Smoking - 20 Years’ Observations on Male British Doctos.” British Medical Journal 2 (6051): 1525–36.
Faraway, J. J. 2016. Extending the Linear Model with r. 2nd ed. London: CRC press.
Fisher, R. A. 1937. The Design of Experiments. Oliver; Boyd.
Haberman, S. J. 1973. “The Analysis of Residuals in Cross-Classified Tables.” Biometrics 29: 205–20.
Kateri. 2014. Contingency Table Analysis - Methods and Implementation Using r. New York: Birkhauser.
Mantel, N. 1963. “Chi-Square Tests with One Degree of Freedom- Extensions of the Mantel-Haenszel Procedure.” Journal of the American Statistical Association 58: 690–700.
Tutz, G. 2012. Regression for Categorical Data. Cambridge: Cambridge University Press.
Wilks, S. S. 1938. “The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses.” Annals of Mathematical Statistics 9 (1): 60–62.
Yates, F. 1934. “Contingency Tables Involving Small Numbers and the Chi Square Test.” Journal of the Royal Statistical Society Supplement 1: 217–35.

References

Dawson, R. J. M. 1995. “The Unusual Episode Data Revisited.” Journal of Statistical Education 3 (3).
Doll, R., and R. Peto. 1976. “Mortality in Relation to Smoking - 20 Years’ Observations on Male British Doctos.” British Medical Journal 2 (6051): 1525–36.
Fisher, R. A. 1937. The Design of Experiments. Oliver; Boyd.

  1. Crucially, using the data, we estimate…↩︎

  2. Relative risk was introduced in Section 2.1.3.2.↩︎

  3. In brackets here as in this case we are also talking about the population of interest.↩︎

  4. Exact in the sense that the probabilities of any possible outcome can be calculated exactly.↩︎

  5. Note that this distribution follows in the context of this scenario since we view the lady as randomly guessing which \(q=n_{+1}=4\) cups had milk added first from the total of \(N=n_{++}=8\) cups and seeing how many are the type of interest, namely those for which milk really was added first \(M=n_{+1}=4\). Having explained it this way, we could also view it as \(\mathcal{H} g (N = n_{++}, M = n_{+1}, q = n_{1+})\).↩︎