## A.7 Answer: TW 7 tutorial

### Answers for Sect. 7.2

Answers implied by H5P.

\(\displaystyle \text{s.e.}(\hat{p}) = \sqrt{ \big(\hat{p}\times(1 - \hat{p}) \big)/n}\), where \(\hat{p}\) is the *sample proportion*; \(n\) the *sample size*; "s.e." the "standard error".

### Answers for Sect. 7.3

- \(123/(404 - 123) = 123/281 = 0.44\).
- \(\hat{p} = 123/404 = 0.304455\).
- The
**odds**: The likelihood of surviving is about \(0.44\) times the probability of dying (ie. it is lower). Or: For every \(100\) that die, about \(44\) survive. - No: sampling variation!
- \(\text{s.e.}(\hat{p}) = \sqrt{0.304455 \times (1 - 0.304455)/404} = \sqrt{0.00052416} = 0.022894\), or about \(0.023\).

A definition can be found in the textbook*Glossary*. Essentially, each sample is likely to produce a different value for the sample proportion, \(\hat{p}\) (the estimate of the population proportion, \(p\)), and that is what we mean by "sampling variation". - (Not provided.)
- The values of \(\hat{p}\) will have an approximate normal distribution, with a standard deviation equal to standard error (\(0.023\)) and centred around the true proportion \(p\). Since we don't know \(p\), we need to centre it around our best guess of \(p\), which is \(\hat{p} = 0.304\). So something like this:
- A \(95\)% CI: \(0.304455 \pm (2\times0.022894)\), or \(0.30446\pm0.04579\), or \(0.26\) to \(0.35\).
- One way of writing communicating: “The population proportion of patients surviving after BVM treatment has a \(95\)% chance of lying between \(26\)% and \(35\)%.” This is not strictly correct, but acceptable and very, very commonly used (as explained in the textbook.
- The number of surviving and non surviving both exceed \(5\).
- Larger, to get a tighter (more precise) CI than the one calculated.
- See table below.
- Use a stacked or side-by-side barchart, for example. But a chart is not really needed: just give the information in text.
- Odds of
*not*surviving in ETI group is \((306/110) = 2.8\), so the OR is \((2.3/2.8) = 0.82\). The odds of a BV patient not surviving are \(0.8\) times as great as the odds of an ETI patient not surviving. - A greater sample size would give a more precise estimate. But rather than a greater sample size (which would still be helpful), probably more important is to consider other relevant issues that have not been discussed so far: Relative costs; ease of use; confounding variables; potential side effects; etc.

Method |
Survived | Did not survive | Total |
---|---|---|---|

BVM |
\(123\). | \(281\) | 404 |

ETI then BVM |
\(110\) | \(306\) | 416 |

### Answers for Sect. 7.4

- \(\mu\) is the population mean diameter size of
*all*EB pizzas; \(\bar{x}\) is the mean diameter of the pizzas in the sample. - \(\bar{x} = 11.486\) inches; It's not sensible to quote the diameter to \(0.001\) of a cm; what is sensible?
We don't know the value of \(\mu\), and we never will.
Our best
*estimate*is the value of \(\bar{x}\). - \(s = 0.24658\) inches. It's not sensible to quote the diameter to \(0.001\) of a cm though. \(\sigma\) is the standard deviation of the population. We don't know the value of \(\sigma\), and we never will.
- \(\displaystyle\text{s.e.}(\bar{x}) = s/\sqrt{n} = 0.24658/\sqrt{125} = 0.02205\).
- The first measures the variation in the diameters of individual pizzas; the second measures the precision of the sample mean when used to estimate the population mean.
- Almost certainly not the same. Probably close to \(\bar{x} = 11.486\) inches. More precisely, probably within three standard errors (\(3\times 0.022\)) of \(\bar{x}\).
- Normal; mean \(\mu\); std. dev is the standard error of \(0.02205\).
- The approximate \(95\)% CI is \(11.486\pm (2\times 0.02205)\) or \(11.486\pm0.044\), which is from \(11.44\) to \(11.53\) inches.
- Based on the sample, a \(95\)% confidence interval for the population mean for the pizza diameter is between \(11.44\) and \(11.53\) inches.
- \(n > 25\)
**or**\(n\le 25\) and population has normal distribution. - We do not need to
**assume**that \(n > 25\) because we know that it is. (We do*not*require that the sample or the population has a normal distribution. We require that the*sample means*have an approximate normal distribution, which they will if \(n > 25\).) So the CI is statistically valid. - Population mean diameter probably not \(12\) inches based on the CI.

### Answers for Sect. 7.5

**Repeated-measures**: Every subject has two TWMT recorded.- \(\mu_d\) is the mean difference in the target population; \(\bar{d}\) is the mean difference in this sample.
- Each measurement is measured after and before on the same subject.
- \(32\); \(12\); \(24\); \(30\); \(8\); \(14\); \(14\); \(28\); \(38\); \(49\).
(Differences in the other direction are also acceptable; it just changes the signs of these differences and so on.
**Importantly, the direction should be stated somewhere.**) - It makes more sense to define directions this way, so that the difference is the
*increase*in 2MWT. - \(\bar{d} = 24.9\); \(s_d=13.03372\).
- \(\text{s.e.}(\bar{d}) = s_d/\sqrt{n} = 13.03372/\sqrt{10} = 4.121623\). This is the standard deviation of the sample mean difference, a measurement of how precisely the sample mean difference measures the population mean difference.
- Almost impossible. Sample means vary every time we take a sample around the true mean difference, with a normal distribution with standard error \(4.12\). Since we don't know \(\mu\), the best we can say is that the sample mean will vary about our best guess of the population mean; in other words, the sample means vary around \(24.9\) with a standard deviation of about \(4.12\).
- Normal; mean \(\mu\), std deviation is the standard error of \(4.121\).
- \(24.9\pm (2\times 4.121623)\), or \(24.9\pm 8.243246\), or from \(16.65675\) to \(33.143245\)m.
- We are \(95\)% confident that the population 2MWT
*increases*by a mean amount between \(16.7\) and \(33.1\) m. - Either (or both) of these must be true:
- the population has a normal distribution, and/or
- the sample size is large enough so the sample means have a normal distribution (i.e., larger than \(25\)).

- Since \(n < 25\), we need to assume the population of differences has a normal distribution. A stem plot suggests this is not unreasonable, so the sample means possibly have an approx. normal distribution.
- (Recall we haven't done hypothesis testing in this context yet!) Looks pretty likely that the 2MWT distances are higher after receiving the implant.

### Answers for Sect. 7.8.1

- \(\bar{x} = 16.02\)m.
- \(s = 7.145\)m; \(\text{s.e.}(\bar{x}) = s/\sqrt{n} = 7.145/\sqrt{44} = 1.077\)m. The first is a measure of the variation in the original data; the second is a measure of the precision of the sample mean when estimating the population mean. 3.The CI is from \(13.85\) to \(18.19\) m.
- \(95\)% CI for population mean guess: \(13.85\) to \(18.19\) m.
- The population of differences has a normal distribution, and/or \(n > 25\) or so.
- Since \(n > 25;\), all OK if the histogram isn't severely skewed; probably OK.
- Not really; the CI doesn't contain the true width. But was this just due to the metric units... or perhaps students are just very poor at estimating widths in general! In fact, the Professor also had the students estimate the width of the hall in imperial units also, as a comparison.

### Answers for Sect. 7.8.2

- Relational.
- \(\hat{p} = 352/2\ 864 = 0.12291\).
- \(\text{s.e.(}\hat{p}) = \sqrt{0.12291 \times (1 - 0.12291)/2864} = 0.006135\).
- An approximate 95% CI is \(0.12291 \pm (2\times 0.006135)\), or \(0.12291\pm 0.01227\), or from \(0.111\) to \(0.135\).
Either the '\(0.123\pm 0.012\)' form or the '\(0.111\) to \(0.135\)' form is fine; percentages or proportions are fine (but
**the calculations must done with the proportions, not the percentages**). - We need the number of boys who are late maturers and who are
*not*late maturers to both be greater than \(5\). This is true, so the calculations are valid. - Smaller; the current sample size estimates \(p\) to within \(1.2\%\), and less accuracy needs fewer in the sample.
- \(n = 1/(0.02)^2 = 2500\) boys.

### Answers for Sect. 7.8.3

- Because each method is used in each sea state.
- \(\bar{d} = 0.06167\); \(s_d = 0.2901\). The mean difference is positive: Method 1 measurements slightly higher (on average) than Method 2.
- \(\text{s.e.}({\bar{d}}) = s_d/\sqrt{n} = 0.2901/\sqrt{18} = 0.0684\). \(\text{s.e.}({\bar{d}})\) measures the precision with which the sample mean difference estimates the population mean difference.
- \(0.06167\pm(2\times 0.0684)\), which is \(0.06167 \pm 0.137\), or from \(-0.075\) to \(0.199\) Newton--metres.
- Since \(n < 25s\), we require that the
*differences*in the population have a normal distribution. - The stem-and-leaf plot of the
*sample*doesn't suggest the*population*is non-normal. - Since the CI includes zero, possibly the population mean difference could be zero.

### Answers for Sect. 7.8.4

- \(\mu_d\) is the mean difference in the target population; \(\bar{d}\) is the mean difference in this sample.
- Each plasma \(\beta\) measurement is measured after and before on the same runner.
- The
*direction*of the difference should be clearly stated. - \(\bar{d} = 18.736\); \(s_d = 8.3297\)
- \(\text{s.e.}(\bar{d}) = s_d/\sqrt{n} = 8.3297/\sqrt{11} = 2.5115\). This is the standard deviation of the sample mean difference, a measurement of how precisely the sample mean difference measures the population mean difference.
- Almost impossible. The sample means would vary every time we took a sample, around the true mean difference with a normal distribution having a standard error of about \(2.51\). Since we don't know the population mean, the best we can say is that the sample mean will vary about our best guess of the population mean. In other words, the sample means will vary around \(8.33\) with a standard deviation of about \(2.51\).
- \(18.736\pm (2\times 2.5115)\), or \(18.736\pm5.023\), or from \(13.7\) to \(23.8\) pmol/litre.
- We are 95% confident that the population \(\beta\) plasma concentration
*increases*by a mean amount between \(13.7\) and \(23.8\) pmol/litre, during the fun run. - Either (or both) of these must be true: the population has a normal distribution, and/or the sample size is large enough so that the sample means have a normal distribution, so about larger than \(25\).
- Since \(n < 25\), we need to assume the population of differences has a normal distribution. The stem-and-leaf plot suggests this is not unreasonable, so the sample means quite possibly have an approx. normal distribution:
- Looks likely that the plasma \(\beta\) concentrations are higher after the race.
- \(n = (2\times 8.3297\div 2.5)^2 = 43.45\), so need data from \(44\) runners.

### Answers for Sect. 7.7

- \(\sqrt{0.70\times(1 - 0.70)/25} = \sqrt{0.084} = 0.2898275\), or about 0.2898.
- \(\sqrt{0.25\times(1 - 0.25)/100} = \sqrt{0.001875} = 0.04330127\), or about 0.04330.
- \(0.08724964\), or about \(0.08725\).
- \(0.0534479\), or about \(0.05345\).

**Note**: Students commonly forget to take the square root.

**Note**: If you calculator gives an answer something like `1.875 E-03`

or similar, it is using **scientific notation**.
It means \(1.875\times 10^{-3}\), or \(0.001875\).