A.8 Answer: TW 8 tutorial

Answers for Sect. 8.2

  1. This is just one box of matches (one observation), but the claim is about the population mean. Some boxes will have more than \(45\), and some fewer.

  2. Jake is correct in one sense: You can't have \(0.9\) of a match. But the value is the mean number, and that can be a decimal. Suppose \(10\) boxes had \(49\) matches, and \(10\) boxes had \(50\) matches... is the mean \(49\), or is it \(50\)? Neither are correct; the mean is \(49.5\).

  3. Jake is confusing the sample and population mean. The claim is that the population mean is \(45\). The sample produced a mean of \(44.9\).

    Why should the mean of two different things be the same? It's like expecting your height and your Mum's height to be the same: they are both heights, but of different things. Why should they be the same?

    Of course, every sample will produce a different sample mean. This sample may just have an unusually low number of matches.

  4. Either (1) The manufacturer is lying; or (2) the manufacturer is not lying, and this sample just happens to have a smaller number of matches: (bad) luck.

  5. A CI gives some indication of the variation implied by the sample.

  6. The standard error for the mean is \(0.124\div\sqrt{25} = 0.0248\). So the approximate \(95\)% CI is: \(44.9 \pm (2\times 0.0248)\), or \(44.9\pm 0.0496\), or from \(44.85\) to \(44.95\).

  7. No. A \(95\)% CI may or may not contain the population mean. Of course, the manufacturer may indeed be lying... but we'd need to be cautious about making such a bold claim on just this evidence. Ideally, we would repeat this study a few times or take a larger sample. But it is looking suspicious...

    If we had many, many sets of \(25\) matches boxes, \(95\)% of these sets of \(25\) would have a mean between \(44.85\) and \(44.95\).

  8. \(\bar{x}\) is the mean of the sample, so \(\bar{x} = 44.9\).

  9. \(\mu\) is the mean of the population; the true mean if you like. \(\mu\) is claimed to be 45, but the the value of \(\bar{x}\) will, of course, vary.

Answers for Sect. 8.3

  1. The two groups are completely different.

  2. The parameter of interest is the difference between the population mean lifetimes, say \(\mu_R - \mu_F\).

  3. The 95% CI is the bottom one: from \(223.34\) to \(346.13\) days.

  4. The best of these is Option (e)... but in practice, we usually think about CIs in terms of Option (d).

  5. The CI explanation can be improved by (i) indicating which diet leads to larger average lifetimes; and (ii) providing sample summary info. Here is a better answer:

    "The \(95\)% confidence interval for the difference between the populations mean lifetimes of rats on the restricted diet (sample mean: \(968.8\) days; std dev: \(284.6\) days) and on the free-eating diet (\(684.0\) days; std dev: \(134.1\) days) is that rats on a restricted diet live between \(223.34\) and \(346.13\) days longer."

  6. Since the sample is large, we must have that the two samples are independent (which is reasonable). (The figure is not needed.)

  7. The boxplots show the variation in the lifetimes of individual rats. The error bar chart displays the variation that the sample means would be expected to show from sample to sample.

Answers for Sect. 8.4

Some answers embedded.

  1. See Table A.2.
  2. Use a side-by-side barchart, for example, if necessary.
  3. Odds of boys maturing late: \(352 \div (2\,864 - 352) = 0.1401\): boys are \(0.1401\) times more likely to mature late than not.
  4. Odds of girls maturing late: \(336 \div (2\,328) = 0.1443\): girls are \(0.1443\) times more likely to mature late than not.
  5. Hence, to compare boys to girls: \(0.1401 \div 0.1443 = 0.971\).
  6. The parameter of interest is the population odds ratio of late maturing, comparing boys to girls.
  7. From software: OR is \(0.971\), and \(95\)% CI is from \(0.828\) to \(1.139\).
  8. See Table A.3. 1, The difference could be explained by sampling variation, or because of a real difference...
TABLE A.2: Maturation and gender
Matured late Did not mature late Total
Males 352 2512 2864
Females 336 2328 2664
Total 688 4840 5528
TABLE A.3: Maturation and gender: Numerical summary (Enter percentages to one decimal place. Enter odds and odds ratio to three decimal places)
Percentage maturing late Odds maturing late Sample size
Males 12.3 0.1401 2864
Females 12.6 0.1443 2664
Odds ratio 0.97088

Answers for Sect. 8.5

\(n=(2\times 7.145\div 0.5)^2 = 816.8\), so use guesses from \(817\) students.

Answers for Sect. 8.6.1

  1. Researchers (Nataraja et al. 1999) examined the strength of fibre reinforced concrete, by using a study design called an experiment. In batch 1, a sample of size \(30\) was used; the sample mean number of blows till the first crack appeared in the test cylinders was \(98\), and the amount of variation in the number of blows was measured using the standard deviation as \(54\). Because the data are a sample, the sample mean will estimate the population mean with some sampling error.

  2. A type of study called an experiment compared the handwriting legibility for school children (Ryan et al., 2010) having cerebral palsy when using specialist school furniture with standard school furniture (which acted as a control). They used a random sample of size \(30\) from children registered at their facility in Canada. The sample mean for the difference in legibility was \(-0.1\), and a \(95\)% confidence interval was from \(-0.8\) to \(0.6\). Using the standard equipment, the smallest value recorded for legibility was \(19\), and the largest was \(34\), so the range was \(15\).

Answers for Sect. 8.6.2

  1. Observational.
  2. Relational.
  3. Two completely separate samples are compared.
  4. \(-0.774\) to \(-0.560\) inches (differences are Dominos less Eagle Boys).
  5. The 95% CI for the difference in population means pizza diameters between EB and DOM pizzas from \(0.774\) to \(0.590\) inches, larger for EB.
  6. Since the sample sizes are large (both \(125\)), we do not require that the populations have normal distributions.
  7. The sample sizes are large (\(n = 125\) in each), so we don't need the populations to be normally distributed; we don't need the histogram.
  8. Probably yes. Amount of topping on the pizza? Which tastes better? Whether the samples were randomly selected or not?

Answers for Sect. 8.6.3

  1. Odds HIE: \(263/151 = 1.742\). HIE is \(1.74\) times more likely to be for GM foods than against.
  2. Odds LIE: \(258\div222 = 1.162\). LIE is \(1.16\) times more likely to be for GM foods than against.
  3. \(\text{OR}(\text{HIE in favour})\div\text{OR}(\text{LIE in favour}) = 1.742/ 1.162 = 1.5\) (\(1.499\) in table). The odds of HIE being for GM food \(1.5\) times the odds that a LIE for GM foods.
  4. From the sample, we estimate the OR in the population to be between \(1.145\) to \(1.961\). (Loosely, though technically incorrect: the true OR is likely to be between \(1.145\) and \(1.961\).) Importantly, this interval does not include \(1\).

References

Nataraja MC, Dhang N, Gupta AP. Statistical variations in impact resistance of steel fiber-reinforced concrete subjected to drop weight test. Cement and Concrete Research. 1999;29:989–95.