## A.6 Answer: TW 6 tutorial

$\text{s.e.}(\hat{p}) = \sqrt{\frac{\hat{p}\times(1 - \hat{p})}{n}}$

• $$\hat{p}$$ is the sample proportion.
• $$n$$ is the sample size.
• "s.e." stands for "standard error".

1. $$123/(404-123) = 123/281 = 0.44$$.

2. $$\hat{p} = 123/404 = 0.304455$$.

3. The odds: The likelihood of surviving is about 0.44 times the probability of dying (ie. it is lower). Or: For every 100 that die, about 44 survive.

4. No: sampling variation!

5. $$\text{s.e.}(\hat{p}) = \sqrt{0.304455 \times (1-0.304455)/404} = \sqrt{0.00052416} = 0.022894$$, or about 0.023.
A definition can be found in the Glossary. Essentially, each sample is likely to produce a different value for the sample proportion, $$\hat{p}$$ (the estimate of the population proportion, $$p$$), and that is what we mean by "sampling variation".

6. (Not provided.)

7. The values of $$\hat{p}$$ will have an approximate normal distribution, with a standard deviation equal to standard error ($$0.023$$) and centred around the true proportion $$p$$. Since we don't know $$p$$, we need to centre it around our best guess of $$p$$, which is $$\hat{p}=0.304$$. So something like this:

8. A 95% CI: $$0.304455 \pm (2\times0.022894)$$, or $$0.30446\pm0.04579$$, or $$0.26$$ to $$0.35$$.

9. One way of writing communicating: “The population proportion of patients surviving after BVM treatment has a 95% chance of lying between 26% and 35%.” This is not strictly correct, but acceptable and very, very commonly used (as explained in the textbook).

10. The number of surviving and non surviving both exceed 5.

11. Larger, to get a tighter (more precise) CI than the one calculated.

12. See table below.

13. Use a stacked or side-by-side barchart, for example. But a chart is not really needed: Just give the information in text.

14. Odds of not surviving in ETI group is $$(306/110) = 2.8$$, so the OR is $$(2.3/2.8) = 0.82$$. The odds of a BV patient not surviving are 0.8 times as great as the odds of an ETI patient not surviving.

15. A greater sample size would give a more precise estimate. But rather than a greater sample size (which would still be helpful), probably more important is to consider other relevant issues that have not been discussed so far: Relative costs; ease of use; confounding variables; potential side effects; etc.

Method Survived Did not survive Total
BVM 123 281 404
ETI then BVM 110 306 416

1. $$\mu$$ is the population mean diameter size of all EB pizzas; $$\bar{x}$$ is the mean diameter of the pizzas in the sample.

2. $$\bar{x} = 11.486$$ inches; It's not sensible to quote the diameter to $$0.001$$ of a cm; what is sensible?

We don't know the value of $$\mu$$, and we never will. Our best estimate is the value of $$\bar{x}$$.

3. $$s=0.24658$$ inches. It's not sensible to quote the diameter to $$0.001$$ of a cm though.

$$\sigma$$ is the standard deviation of the population. We don't know the value of $$\sigma$$, and we never will.

4. $$\displaystyle\text{s.e.}(\bar{x}) = s/\sqrt{n} = 0.24658/\sqrt{125} = 0.02205$$.

5. The first measures the variation in the diameters of individual pizzas; the second measures the precision of the sample mean when used to estimate the population mean.

6. Almost certainly not the same. Probably close to $$\bar{x}=11.486$$ inches. More precisely, probably within three standard errors ($$3\times 0.022$$) of $$\bar{x}$$.

7. Normal; mean $$\mu$$; std. dev is the standard error of 0.02205.

8. The approximate 95% CI is $$11.486\pm (2\times 0.02205)$$ or $$11.486\pm0.044$$, which is from 11.44 to 11.53 inches.

9. Based on the sample, a 95% confidence interval for the population mean for the pizza diameter is between $$11.44$$ and $$11.53$$ inches.

10. $$n>25$$ or $$n\le 25$$ and population has normal distribution.

11. We do not need to assume that $$n>25$$ because we know that it is. (We do not require that the sample or the population has a normal distribution. We require that the sample means have an approximate normal distribution, which they will if $$n>25$$.) So the CI is statistically valid.

12. Population mean diameter probably not 12 inches based on the CI.

13. Compute: $n=\left(\frac{2\times 0.24658}{0.04}\right)^2 = 152.004$ at least, so we would need 153 pizzas.

1. Descriptive: Every subject is treated the same way; we are not comparing two groups that have been treated differently.
2. $$\mu_d$$ is the mean difference in the target population; $$\bar{d}$$ is the mean difference in this sample.
3. Each measurement is measured after and before on the same subject.
4. $$32$$; $$12$$; $$24$$; $$30$$; $$8$$; $$14$$; $$14$$; $$28$$; $$38$$; $$49$$. (Differences in the other direction are also acceptable; it just changes the signs of these differences snd so on. Importantly, the direction should be stated somewhere.)
5. It makes more sense to define directions this way, so that the difference is the increase in 2MWT.
6. $$\bar{d} = 24.9$$; $$s_d=13.03372$$
7. $$\text{s.e.}(\bar{d}) = s_d/\sqrt{n} = 13.03372/\sqrt{10} = 4.121623$$. This is the standard deviation of the sample mean difference, a measurement of how precisely the sample mean difference measures the population mean difference.
8. Almost impossible. Sample means vary every time we take a sample around the true mean difference, with a normal distribution with standard error $$4.12$$. Since we don't know $$\mu$$, the best we can say is that the sample mean will vary about our best guess of the population mean; in other words, the sample means vary around $$24.9$$ with a standard deviation of about $$4.12$$.
9. Normal; mean $$\mu$$, std deviation is the standard error of 4.121.
10. $$24.9\pm (2\times 4.121623)$$, or $$24.9\pm 8.243246$$, or from $$16.65675$$ to $$33.143245$$m.
11. We are 95% confident that the population 2MWT increases by a mean amount between $$16.7$$ and $$33.1$$m.
12. Either (or both) of these must be true:
1. the population has a normal distribution, and/or
2. the sample size is large enough so that the sample means have a normal distribution, so about larger than 25.
13. Since $$n<25$$, we need to assume the population of differences has a normal distribution. A stem-and-leaf plot suggests this is not unreasonable, so the sample means quite possibly have an approx. normal distribution:
14. (Recall we haven't done hypothesis testing in this context yet!) Looks pretty likely that the 2MWT distances are higher after receiving the implant.
15. $$n = (2\times 13.03372\div 5)^2 = 27.18$$, so need data from at least $$28$$ amputees.

1. $$\bar{x}=16.02$$m.
2. $$s=7.145$$m; $$\text{s.e.}(\bar{x}) = s/\sqrt{n} = 7.145/\sqrt{44} = 1.077$$m. The first is a measure of the variation in the original data; the second is a measure of the precision of the sample mean when estimating the population mean. 3.The CI is from 13.85 to 18.19m.
3. 95% CI for population mean guess: 13.85 to 18.19m.
4. The population of differences has a normal distribution, and/or $$n>30$$ or so.
5. Since $$n>30$$, all OK if the histogram isn't severely skewed. Probably OK.
6. Not really; the CI doesn't contain the true width. But was this just due to the metric units... or perhaps students are just very poor at estimating widths in general! In fact, the Professor also had the students estimate the width of the hall in imperial units also, as a comparison.

1. Relational.
2. $$\hat{p} = 352/2,864=0.12291$$.
3. $$\text{s.e.(}\hat{p}) = \sqrt{0.12291 \times (1-0.12291)/2864} = 0.006135$$.
4. An approximate 95% CI is $$0.12291 \pm (2\times0.006135)$$, or $$0.12291\pm0.01227$$, or from $$0.111$$ to $$0.135$$. Either the '$$0.123\pm 0.012$$' form or the '$$0.111$$ to $$0.135$$' form is fine; percentages or proportions are fine (but the calculations must done with the proportions, not the percentages).
5. We need the number of boys who are late maturers and who are not late maturers to both be greater than 5. This is true, so the calculations are valid.
6. Smaller; the current sample size estimates $$p$$ to within $$1.2\%$$, and less accuracy needs fewer in the sample.
7. $$n = 1/(0.02)^2 = 2500$$ boys.

1. Because each method is used in each sea state.
2. $$\bar{d}=0.06167$$; $$s_d = 0.2901$$. The mean difference is positive: Method 1 measurements slightly higher (on average) than Method 2.
3. $$\text{s.e.}({\bar{d}})= s_d/\sqrt{n} = 0.2901/\sqrt{18} = 0.0684$$. $$\text{s.e.}({\bar{d}})$$ measures the precision with which the sample mean difference estimates the population mean difference.
4. $$0.06167\pm(2\times 0.0684)$$, which is $$0.06167 \pm 0.137$$, or from $$-0.075$$ to $$0.199$$ Newton--metres.
5. Since $$n<30$$, we require that the differences in the population have a normal distribution.
6. The stem-and-leaf plot of the sample doesn't suggest the population is non-normal.
7. Since the CI includes zero, possibly the population mean difference could be zero.

1. $$\mu_d$$ is the mean difference in the target population; $$\bar{d}$$ is the mean difference in this sample.
2. Each plasma $$\beta$$ measurement is measured after and before on the same runner.
3. The direction of the difference should be clearly stated.
4. $$\bar{d} = 18.736$$; $$s_d=8.3297$$
5. $$\text{s.e.}(\bar{d}) = s_d/\sqrt{n} = 8.3297/\sqrt{11} = 2.5115$$. This is the standard deviation of the sample mean difference, a measurement of how precisely the sample mean difference measures the population mean difference.
6. Almost impossible. The sample means would vary every time we took a sample, around the true mean difference with a normal distribution having a standard error of about $$2.51$$. Since we don't know the population mean, the best we can say is that the sample mean will vary about our best guess of the population mean; in other words, the sample means will vary around $$8.33$$ with a standard deviation of about $$2.51$$.
7. $$18.736\pm (2\times 2.5115)$$, or $$18.736\pm5.023$$, or from $$13.7$$ to $$23.8$$ pmol/litre.
8. We are 95% confident that the population $$\beta$$ plasma concentration increases by a mean amount between $$13.7$$ and $$23.8$$ pmol/litre, during the fun run.
9. Either (or both) of these must be true: the population has a normal distribution, and/or the sample size is large enough so that the sample means have a normal distribution, so about larger than 25.
10. Since $$n<25$$, we need to assume the population of differences has a normal distribution. The stem-and-leaf plot suggests this is not unreasonable, so the sample means quite possibly have an approx. normal distribution:
11. Looks likely that the plasma $$\beta$$ concentrations are higher after the race.
12. $$n = (2\times 8.3297\div 2.5)^2 = 43.45$$, so need data from $$44$$ runners.

1. $$\sqrt{0.70\times(1 - 0.70)/25} = \sqrt{0.084} = 0.2898275$$, or about 0.2898.
2. $$\sqrt{0.25\times(1 - 0.25)/100} = \sqrt{0.001875} = 0.04330127$$, or about 0.04330.

Note: Students commonly forget to take the square root.

Note: If you calculator gives an answer something like 1.875 E-03 or similar, it is using scientific notation. It means $$1.875\times 10^{-3}$$, or $$0.001875$$.