# D Answers to end-of-chapter exercises

This Appendix contains answers to most (not all) exercises. Some are fully worked, and some are only brief solutions.

• Answers to Chap. 1 (Introduction): Sect. D.1.
• Answers to Chap. 2 (Research questions): Sect. D.2.
• Answers to Chap. 3 (Research design): Sect. D.3.
• Answers to Chap. 4 (Ethics): Sect. D.4.
• Answers to Chap. 5 (Sampling): Sect. D.5.
• Answers to Chap. 6 (Factors that influence the response variable): Sect. D.6.
• Answers to Chap. 7 (Designing experiments): Sect. D.7.
• Answers to Chap. 8 (Designing observational studies): Sect. D.8.
• Answers to Chap. 9 (Interpretation): Sect. D.9.
• Answers to Chap. 10 (Collecting data): Sect. D.10.
• Answers to Chap. 11 (Describing variables): Sect. D.11.
• Answers to Chap. 12 (Graphs): Sect. D.12.
• Answers to Chap. 13 (Numerical summaries for quantitative data): Sect. D.13.
• Answers to Chap. 14 (Numerical summaries for qualitative data): Sect. D.14.
• Answers to Chap. 15 (Making decisions): Sect. D.15
• Answers to Chap. 16 (Probability): Sect. D.16.
• Answers to Chap. 17 (Sampling distributions): Sect. D.17.
• Answers to Chap. 18 (Sampling variation): Sect. D.18.
• Answers to Chap. 20 (CIs for one proportion): Sect. D.19.
• Answers to Chap. 22 (CIs for one mean): Sect. D.21.
• Answers to Chap. 23 (CIs for paired data): Sect. D.22.
• Answers to Chap. 24 (CIs for two independent means): Sect. D.23.
• Answers to Chap. 25 (CIs for odds ratios): Sect. D.24.
• Answers to Chap. 26 (Sample size estimation): Sect. D.25
• Answers to Chap. 28 (Tests for one mean): Sect. D.26.
• Answers to Chap. 30 (Tests for paired mean): Sect. D.28.
• Answers to Chap. 31 (Tests for two independent mean): Sect. D.29.
• Answers to Chap. 32 (Tests for odds ratios): Sect. D.30.
• Answers to Chap. 34 (Relationships between two quantitative variables): Sect. D.31.
• Answers to Chap. 35 (Correlation): Sect. D.32.
• Answers to Chap. 36 (Regression): Sect. D.33.
• Answers to Chap. 38 (Writing research): Sect. D.35.

Answers to exercises in Sect. 1.9.

Answer to Exercise 1.1: The RQ requires numerical information to be answered, such as the average time taken to apply the tourniquets. This RQ would be answered using a quantitative RQ.

Answer to Exercise 1.2: The RQ does not require numerical information to be answered. This RQ would be answered using a qualitative RQ.

Answers to exercises in Sect. 2.13.

Answer to Exercise 2.1: See Table D.1.

TABLE D.1: Terms matched with their operational definitions
Term Definition
Rainwater Rainwater from a rainwater collection tank on your property
Bottled Water sold in bottles by food companies that is widely available to the public for purchase and consumption
Tap Water you presently use throughout your dwelling (home)
Recycled Highly purified wastewater deemed by scientists as safe for human consumption
Desalinated Highly purified seawater deemed by scientists and public health officials as safe for human consumption

Answer to Exercise 2.2: 1. P: University students. 2. O: Average resting diastolic blood pressure. 3. C: between students who regularly drive to UniSC and those who regularly ride their bicycles. 4. No intervention. 5. Relational. 6. Conceptual: What is meant by 'regularly'; 'university student' (on-campus and online? undergraduate and postgraduate? full-time and part-time?). Operational: how 'resting diastolic blood pressure' will be measured. 7. Resting diastolic blood pressure; whether they regularly drive to university or regularly ride their bicycles.

Answer to Exercise 2.3: 1. Some elements are not well defined, but perhaps: P: Children aged under 3 in a Peruvian peri-urban community; O: proportion of children with diarrhoea; C: nutritional status; No intervention. 2. Hard to be sure; perhaps something like: 'In children aged under 3 in a Peruvian peri-urban community, is there a a relationship between diarrhoea status and nutritional status?'. 3. Relational. 4. How is 'diarrhoea status' measured? Likewise, how is 'nutritional status' measured? There are probably others. 5. Response: diarrhoea status; explanatory: nutritional status.

Answer to Exercise 2.4: Recall that the outcome is used to describe a group (the population), not the individuals.

1. The percentage of vehicles that crash. 2. The average jump height. 3. The average number of tomatoes per plant. 4. The percentage of people who own a car.

Answer to Exercise 2.5: Recall that the explanatory variable is what is actually measured on the individuals in the population.

1. The type of car fuel. 2. The type of coffee. 3. The dose of iron supplement. 4. The diet.

Answer to Exercise 2.6: 1. Does have a comparison (between a group of people in winter, and a different group of people in summer). The outcome is 'the percentage of people wearing hats'. 2. Does not have a comparison. Two subsets of the population are not being compared: instead, each person is measured twice. So an Outcome may be 'average change in cholesterol levels'. 3. Does not have a comparison. Two subsets of the population are not being compared: instead, each person gets two measurements. So an Outcome may be 'average difference between right- and left-leg balance times'. 4. Does have a comparison: The three subsets of the population are being compared: the three groups of tomato plants. The Outcome is 'average yield' (which could be measured in kg/plant, tomatoes/plant, kg/hectare, etc).

Answer to Exercise 2.7: The unit of observation is the animal; the animals, for example, are weighed.

The unit of analysis is the pen, as the food is allocated to the animals in the whole pen. In addition, the animals in the same pen are not independent: they compete for the same space, food, resources, and would all have similar environments that they share.

Answer to Exercise 2.8: The population surely is not 10 adults; that sounds like the sample. It does not make clear how many fonts are being compared (or which fonts are being used).

Perhaps try this:

Among Australian adults, is the average time taken to read a passage of text different when Arial font is used compared to Times Roman font?

Answer to Exercise 2.9: The RQ is about comparing groups, so it should talk about the average lung capacity of males and females. Perhaps:

Of students that study at UniSC, Sippy Downs, do males have a larger average lung capacity than females?

Answers to exercises in Sect. 3.10.

Answer to Exercise 3.1: The researchers could decide which beams go into Group A and into Group B. Researchers could also allocate treatments to the groups: they could select what treatments is applied to each group of beams. This is a true experiment.

Answer to Exercise 3.2: The researchers had no say in who was in hospital at the time: they could not allocate the patients to the two groups (overlay; mattress). This is a quasi-experiment.

1. P: Perhaps people in a suburb of the Sunshine Coast;
O: number of doctor's visits in the next six months;
C: between people owning a pet for those six months, and those who do not own a pet for those six months.
2. For an experiment, we would need to intervene to give subjects a pet, or not give them a pet.
3. For an observational study, we would not intervene: We would find the subjects who already owned a pet, or who did not already own a pet.

1. P: A bit vague from this small extract: people of some kind;
O: the average change in body weight over two years;
C: Between the four diets;
I: The diets seems to be have been imposed.
2. Experimental: The diets have been manipulated and imposed by the researchers, with the intent of changing the outcome (the weight change).
3. Probably a true experiment.
4. The individuals: the diets are allocated to each individual.
5. The individuals: those from whom the weight change is taken.
6. The change in body weight over two years.
7. The type of diet.

Answers to exercises in Sect. 4.6.

Answers to exercises in Sect. 5.14.

Answer to Exercise 5.1: A tricky thing here is that some books are not physically in the library, as they have been borrowed.

1. Simple random sample: A list of all the books held by the UniSC library is needed. This may be possible for a librarian (it may not be, and would be really huge), it certainly is not possible for a student or non-library staff member. In principle though, number each book, and randomly select a sample from that list.
2. Stratified: Use locations (Sippy Downs; Fraser Coast; Caboolture; Gympie; Southbank; SCHI) as strata, and then a random sample of all the book in each locations.
3. Cluster: Consider each set of shelves as a cluster, and randomly select some shelves, and determine the number of pages in each book on the selected shelves.
4. Convenience: Finding books in the libraries within reach and easily accessible and on the shelves,
5. Multi-stage: Consider taking a random of campuses, then a random sample of the sets of shelves in the selected libraries, then selecting a random shelf from each one, then a small number of random book from each shelf.
6. Multi-stage perhaps.

Answer to Exercise 5.2: 1. Multi-stage. 2. It's a bit like stratified... but not quite. 3. Convenience. 4. A combination of multi-stage and convenience. 5. The second last is poor, and the last is a slight improvement. The second is bit odd but is probably OK. The first might be the best.

Answer to Exercise 5.3: 1. Convenience, but by approaching every 10th person they are trying to make it a little more representative... but they can do a lot better. 2. Convenience, but by approaching every 5th person and going every day for a week they are trying to make it a little more representative... but they can do a lot better. 3. Self-selecting. 4. Convenience. At least the researcher is trying to get a more representative sample, by going every day for two weeks, and at different times and locations each week, and approaching someone every 15 minutes. 5. The fourth is the best, but it is still far from 'random'. 6. None.

Answer to Exercise 5.4: A bit like cluster sampling (randomly taking a small sample from many groups, and taking everyone (or everything) in those selected groups)... but not every person in the selected schools would respond (they would decide if they responded).

A combination of cluster and voluntary response sampling.

## D.6 Answers: Overview of internal validity

Answers to exercises in Sect. 6.9.

Answer to Exercise 6.1: Presumably all are extraneous variables, as all are possibly related to the response variable (incidence of depression): That is why the researchers obtained this information. None can be lurking variables, as the researchers measure or observe all of them.

To be a confounding variable, the extraneous variable should be related to both the response variable (incidence of depression) and the explanatory variable (diet quality). As a result, all of the extraneous variables could potentially be confounding variables.

Answer to Exercise 6.2: Response variable: something like 'risk of developing a cancer of the digestive system'. Explanatory variable: 'whether or not the participants drank green tea at least three times a week'.

Lurking variable: 'health consciousness of the participants', because the researchers don't seem to have measured or observed this.

Answer to Exercise 6.3: Older children would probably be more likely to be smokers, and would be larger and older in general: age would be a confounding variable. Age is easy to record, and usually is recorded in these types of studies, so probably not a lurking variable. (The age, height and gender of each child is recorded.)

## D.7 Answers: Designing experimental studies

Answers to exercises in Sect. 7.11.

Answer to Exercise 7.2: Observer bias. The researcher is directly contacting the subjects, so may unintentionally influence their responses.

Answer to Exercise 7.3: 1. Randomly allocate the type of water to the subject (or the order in which the subjects taste-test each drink.) 2. The subjects do not know which type of water they are drinking. 3. The person providing the water and receiving the ratings does not know which type of water they are drinking. 4. Hard to find a control. 5. Any random sampling is good, if possible.

Answer to Exercise 7.4: 1. Response: The amount of sunscreen used; Explanatory: The time spent on sunscreen application. 2. They were looking at potential confounding variables. 3. If the mean of both the response and explanatory variables was different for females and males, then the sex of the participant would be a confounding variable, and this would need to be factored into the analysis of the data. 4. The participants are blinded to what is happening in the study.

## D.8 Answers: Designing observational studies

Answers to exercises in Sect. 8.10.

Answer to Exercise 8.1: 1. Since this is an observational study, we cannot allocate students to receive bottled or tap water (because then the study would be an experimental study). In an experiment we could randomly allocate students to receive either bottled or tap water and have them rate the taste (or even randomly allocate students to receive bottled or tap water first, then swap to the other type of water, and each student would then provide two ratings). 2. The students would not be aware of which water they would be drinking. 3. Neither the students nor the researchers who give the students the water would know which type of water the students are drinking. 4. We can't really set up a control here. 5. Any of the random sampling methods are possible, and are preferred. In practice, perhaps use a convenience sample, but try to get a sample as representative as possible (Sect. 5.9).

Answer to Exercise 8.2: Yes. Consider a study of the effect of smoking: non-smokers are the control. However, in an observational study, cases cannot be allocated to be controls.

Answer to Exercise 8.3: No. People can know they are being observed.

Answer to Exercise 8.4: The descriptions indicates that patients probably knew they were involved, so the Hawthorne effect should be considered when interpreting the results.

Answers to exercises in Sect. 9.7.

Answer to Exercise 9.1: Population: 'UniSC students on-campus'. External validity refers to whether the results apply to other members of this population, not to people outside this population (such as members of the general public).

Answer to Exercise 9.2: 1. P: Aircraft passengers aged 18 and over. O: Unclear; something about 'composite of death or major traumatic injury'. C: Between wearing a parachute and wearing a backpack. I: Yes: Having participants wear the parachute or backpack. 2. Experimental: The researchers decide if the participants use a parachute or backpack. 3. Explanatory: 'whether or not a parachute is worn'. Response: harder to understand; is it 'whether or not the participant dies or sustains a major injury'? 4. These results won't apply in the real world; not ecologically valid. In the real world, parachutes are used at high altitude, for example. 5. The study is not very useful! 6. Speaking loosely: That jumping from a small plane that is on the ground, parachutes are equally effective as backpacks in keeping people safe.

Answer to Exercise 9.3: Because the sample is not a random sample, the researchers are (rightly) noting that the results may not generalise to all hospitals. Because the data was only collected at night, perhaps the data is not ecologically valid.

Answers to exercises in Sect. 10.5.

Answer to Exercise 10.1: People aged 18 do not have a category.

Answer to Exercise 10.2: The second. The first is leading: Should concerned dog owners...

Answer to Exercise 10.3: 1. The phrase 'Do you agree' is leading. Placing RIGHT DIRECTION in capitals is leading. Besides, everyone wants their country to head in the right direction... but 'right direction' means varies from person to person. 2. The phrase 'Do you agree' is leading. Phrases like unwavering commitment, respect and incredible veterans and TROOPS are all leading and undefined. 3. The word revitalize is leading.

Answers to exercises in Sect. 11.5.

Answer to Exercise11.1: Foliage biomass: quantitative continuous. Tree diameter (in cm): quantitative continuous. Age of the tree (in years): quantitative continuous. Origin of the tree: Qualitative nominal.

Answer to Exercise 11.2: 1. Systolic blood pressure: quantitative continuous. 2. Program of enrolment: qualitative nominal. 3. Academic grade: qualitative ordinal. 4. Number of times people visited the doctor last year: quantitative discrete.

Answer to Exercise 11.3: 1. Age: qualitative ordinal. 2. Gender: qualitative nominal. 3. Location: qualitative nominal. 4. Social media use: qualitative ordinal. 5. BMI: quantitative continuous. 6. Total sitting time, in minutes per day: quantitative continuous.

Answer to Exercise 11.4: Gender: Qualitative nominal. Age: Quantitative continuous. Height: Quantitative continuous. Weight: Quantitative continuous. GMFCS: Qualitative ordinal.

Answer to Exercise 11.5: Fertilizer dose: Quantitative continuous. Soil nitrogen: Quantitative continuous. Fertilizer source: Qualitative nominal.

Answer to Exercise 11.6: Response of kangaroos: Qualitative ordinal. (Or perhaps nominal?) Height of drone: 'Height' is quantitative, but with just four values used it would probably be treated as qualitative ordinal. Mob sizes: Quantitative discrete. Sex: Qualitative nominal.

Answer to Exercise 11.7: Location is the only variable (something observed from the individuals). The number of people and the percentage of people who died at each location is a summary of the data collected from the individuals. 'Location' is a nominal, qualitative variable, with seven levels.

Answers to exercises in Sect. 12.12.

Answer to Exercise 12.1: None of them are bad graphs. I'd prefer the bar chart, but any are OK.

Answer to Exercise 12.2: A graph of the individual variables is always useful as a starting point: so a bar chart for the origin, and a histogram for the others.

But relationships are the main focus. Relationships between foliage biomass and tree origin: boxplot. Relationships between foliage biomass and the other variables: scatterplot. On the scatterplot, the different origins of the trees could be encoded by using different colours or plotting symbols.

Answer to Exercise 12.3: Gender and GFMCS: both qualitative; the others are quantitative. Relationships between two quantitative variables: use a scatterplot. Relationships between two qualitative variables: (say) a side-by-side bar chart. With one of each: boxplot. See Fig. D.1 for some examples.

Answer to Exercise 12.4: Fertilizer (quantitative): histogram (response variable). Soil nitrogen (quantitative): Histogram (explanatory variable). Source (qualitative nominal): Bar chart (explanatory variable). Relationships: Between fertilizer dose and soil nitrogen: scatterplot. Source could be encoded using different coloured points.

Answer to Exercise 12.5: A bar chart (or dot chart). A pie chart would not be appropriate, as respondents could select more than one option.

Answer to Exercise 12.6: In general, female basketball players are taller than female netballer players (the first, second and third quartiles are all greater for basketball players). For the second and third quartiles, the differences look quite substantial. The minimum heights are similar.

Answer to Exercise 12.7: What do the different plotting symbols mean? The labels on the axes are not helpful. The vertical axis goes up to 35, but could easily stop at 20. See Fig. D.2.

Answer to Exercise 12.8: The graph is inappropriate! Both variables are qualitative, but the graph is a scatterplot (used for two quantitative variables). What does that plot even tell you?

A stacked or side-by-side bar chart should be used (Fig. D.3).

Answer to Exercise 12.9: 1. Response variable: Change in MADRS (quantitative continuous). 2. Explanatory variable: treatment group (qualitative nominal with three levels). 3. Response variable: Histogram. Explanatory: bar chart. Relationship: boxplot.

Answer to Exercise 12.10: See Fig. D.4.

Answer to Exercise 12.11: Variable is the 'Sport' (qualitative). The bars can be ordered any way. Skewness makes no sense: It only makes sense to talk about skewness for quantitative variables.

## D.13 Answers: Numerical summaries for quantitative data

Answers to exercises in Sect. 13.10.

Answer to Exercise 13.1: Probably the median as slightly skewed right, with some outliers. Both the mean and median can be quoted...

Answer to Exercise 13.2: 1. Sample mean: 0.467. 2. Sample median: 3.35. 3. Range: 29.6 (from -19.8 to 9.8). 4. Sample standard deviation: 10.40263. (SOI has no units of measurement.)

Answer to Exercise 13.3: A: II (median; IQR). B: I (mean; standard deviation). C: III (median; IQR).

Answer to Exercise 13.4: See Fig D.5. Worker 2 is faster in general (more panels installed per minute), including one fast outlier. Workers 1 and 3 have similar medians, but Worker 3 is more consistent (smaller IQR).

Answer to exercise 13.5: 1. $$\bar{x} = 3.09$$. 2. median: 2.0. 3. $$s = 2.77$$. 4. IQR: 4.

## D.14 Answers: Numerical summaries for qualitative data

Answers to exercises in Sect. 14.10.

Didn't vomit: 0.738 had beer then wine, 0.262 had wine only. They tell us the proportion that drank various things, among those who did and didn't vomit. 2. Beer then wine: 8.8% vomited and 91.2% didn't; Wine only: 21.4% vomited and 78.6% didn't. They tell us the percentage that vomited, for each drinking type. 3. $$(6 + 6)/(6 + 6 + 62 + 22) = 0.125$$. 4. $$6/22 = 0.2727$$. 5. $$6/62 = 0.096774$$. 6. $$0.27272/0.096774 = 2.82$$. 7. $$0.096774/0.27272 = 0.354$$.

Answer to Exercise 14.2: 1. $$91/(91 + 188) = 0.32616$$. 2. $$188/91 = 2.0659$$, or about 2.07. 3. $$22/13 = 1.6923$$, or about 1.69. 4. $$13/(13 + 22) = 0.37142$$, or about 37.1%. 5. $$2.0659/1.6923 = 1.22$$.

Answer to Exercise 14.3: 1. 21/114, or about 18.4%. 2. 14/54, or about 25.9%. 3. 7/60, or about 11.7%. 4. 21/93, or about 0.226. 5. 14/40, or 0.35. 6. 7/53, or about 0.132. 7. 0.35/0.132, or about 2.7. 8. The odds of no August rainfall in Emerald is 2.7 times higher in months with non-positive SOI.

Answer to Exercise 14.4: 1. 45.9%. 2. 61.4%. 3. 0.848. 4. 1.59. 5. 1.15. 6. 0.533. 7. The odds of reporting back pain from carrying school bags, comparing boys to girls.

Answers to exercises in Sect. 15.8.

Answer to Exercise 15.1: 1. Yes! Seems likely there is a problem (we can't be certain). 2. Assuming the die was fair, I would not expect to get a 6 ten times in a row; sounds highly unusual.

Answer to Exercise 15.2: 1. That the population mean is 12 inches, as claimed. We have no evidence to refute this claim. 2. First: the population mean diameter is $$\mu = 12$$ inches; the sample mean is not 12 inches due to sampling variation. Second: the population mean diameter isn't 12 inches, reflected in the sample. 3. 11.48 is 0.52 inches from the target of 12; seems unlikely that the sample mean would be that far from 12 inches through sampling variation alone. 4. $$\bar{x} = 11.25$$ inches is further from $$\mu = 12$$ that $$\bar{x} = 11.48$$: claim probably not supported. 5. Smaller sample sizes: sample mean would vary more (in general, larger samples give more precise estimates).

Answers to exercises in Sect. 16.8.

Answer to Exercise 16.1: 1. Probability draw a King: $$4/52 = 0.07692$$. 2. Odds draw a King: $$4/48 = 0.08333$$. 3. Probability draw a picture card: $$16/52 = 0.3077$$. 4. Odds draw a picture card: $$16/(52 - 16) = 0.4444$$. 5. Not independent. This is like Example 16.10. 6. Are independent. What happens on the die does not change what happens with the cards.

Answer to Exercise 16.2: Only a 50--50 chance if the events were equally likely... they clearly are not.

Answer to Exercise 16.3: 1. $$9/16$$; about 56.3%. 2. $$6/57$$; about 0.105. (Or, 10.5% if expressed as a percentage.) 3. The number of pilots in each age group.

Answer to Exercise 16.4: 1. Not independent events: If it rains, less likely to walk to work than if it doesn't rain. 2. Not independent events: A smoker is far more likely to suffer from lung cancer than a non-smoker. 3. Independent events: My rubbish is collected, rain or not.

Answer to Exercise 16.5: 1. Expect $$100\times 0.99 = 99$$ people to return a positive test result. 2. Expect $$100\times (1 - 0.98) = 2$$ people to return a positive test result.

A positive test result may or may not mean the person has the disease.

Answer to Exercise 16.6: The reasoning assumes that the three outcomes (HH, TT, HT) are equally likely, which is not true. For example, consider tossing a 20-cent coin (shown in lower-case, normal font) and a 1-dollar coin (shown in capitals, bold font). The four outcomes are: hH, hT, tH tT.

Answers to exercises in Sect. 17.12.

Answer to Exercise 17.1: 1. $$z = (8 - 8.8)/2.7 = 0.2962$$, or $$z = -0.30$$. From tables, the probability is 0.3821, or about 38.2%. 2. $$z = 0.07$$; probability is $$1 - 0.52379 = 0.4721$$, or about 47.2%. 3. The $$z$$-scores are $$z_1 = -0.67$$ and $$z_2 = 0.44$$; the probability is $$0.6700 - 0.2514 = 0.4186$$, or about 41.9%. (Draw a diagram!) 4. Using the tables 'backwards': $$z$$-score is about 1.04; corresponding tree diameter is $$x = 8.8 + (1.04\times 2.7) = 11.608$$, or about 11.6 inches. About 15% of tress will have diameters larger than about 11.6 inches.

Answer to Exercise 17.2: 1. $$z = (39 - 40)/1.64 = -0.6097561$$, or $$z=-0.61$$. Using tables: probability less than this value of $$z$$ is 0.2709, so the answer is $$1 - 0.2709 =0.7291$$, or about 72.9%. 2. $$z = (37 - 40)/1.64 = -1.83$$; probability is 0.0336, about 3.4%. 3. The two $$z$$-scores: $$z_1 = -4.878$$ and $$z_2 = -1.83$$. Drawing a diagram, probability is $$0.0336 - 0 = 0.0336$$, or about 3.4%. 4. The $$z$$-score: 1.64 (or 1.65). Gestation length: $$x = 40 + (1.64 \times 1.64) = 42.7$$ (same answer to one decimal place using $$z = 1.65$$). 5% of gestation lengths longer than about 42.7 weeks. 5. $$z$$-score is -1.64 (or -1.65). Gestation length: $$x = 40 + (-1.64 \times 1.64) = 37.3$$ (same answer to one decimal place using $$z = -1.65$$). 5% of gestation lengths shorter than about 37.3 weeks.

Answer to Exercise 17.3: $$z$$-score: about $$z = 2.05$$. Corresponding IQ: $$x = 100 + (2.05\times 15) = 130.75$$. An IQ greater than about 130 is required to join Mensa.

Answer to Exercise 17.4: An IQ score lower than about 80.8 leads to a rejection by the US military.

Answer to Exercise 17.5: 1: C; 2: A; 3: B; 4: D.

Answer to Exercise 17.6: 1: A; 2: C; 3: B; 4: D.

Answer to Exercise 17.7: Be very careful: work with the number of minutes from the mean, or from 5:30pm. The standard deviation already is in decimal, but converted to minutes, standard deviation is 120 minutes, plus $$0.28\times 60 = 16.8$$ minutes. The standard deviation is 136.8 minutes.

1. 9pm is 3 hours and 30 minutes from 5:30pm: 210 minutes. $$z$$-score: $$z = (210 - 0)/136.8 = 1.54$$; probability: $$1 - 0.9382 = 0.0618$$, or about 6.2%. 2. $$z = (5 - 5.5)/2.28 = -0.22$$; probability: 0.4129\$, or about 41.3%. 3. $$z$$-scores are $$z_1 = -0.22$$ and $$z_2 = 0.22$$; probability: $$0.5871 - 0.4129 = 0.1742$$, or about 17.4%. 4. $$z$$-score is $$0.52$$; time is $$x = 0 + (0.52\times 136.8) = 71.136$$ minutes after 5pm; about one hour and 11 minutes after 5:30pm, or 6:41pm. 5. $$z$$-score: $$-1.04$$; time is $$x = 0 + (-1.04\times 136.8) = -141.272$$, or 141.272 minutes before 5pm; about two hours and 21 minutes before 5:30pm, or 3:09pm.

Answers to exercises in Sect. 18.8

Answer to Exercise 18.1: 1. Standard deviation. 2. Standard error. More specifically, the standard error of the mean. 3. Standard deviation. 4. Standard error. More specifically, the standard error of the proportion.

Answer to Exercise 18.2: 1. No: Population proportions don't vary from sample to sample. 2. Yes: varies from sample to sample. 3. Yes: varies from sample to sample. 4. Yes: varies from sample to sample. 5. No: Population odds don't vary from sample to sample.

Answer to Exercise 18.3: The standard error of the mean is used to describe how much the sample mean is likely to vary from sample to sample. Alternatively, it describes how precisely the sample mean is estimating the (unknown) population mean.

## D.19 Answers: CIs for one proportion

Answers to exercises in Sect. 20.10.

Answer to Exercise 20.1: $$\hat{p} = 2182/6882 = 0.317059$$ and $$n=6882$$. So:

$\text{s.e.}(\hat{p}) = \sqrt{ \frac{0.317059 \times(1 - 0.317059)}{6882} } = 0.005609244.$ The CI is $$0.317059 \pm (2\times 0.005609244)$$, or $$0.317059\pm 0.01121849$$.

Rounding sensibly: $$0.317\pm 0.011$$ (notice we keep lots of decimal places in the working, but round the final answer).

Answer to Exercise 20.2: $$\hat{p} = 8/154 = 0.05194805$$; $$\text{s.e.}(\hat{p}) = 0.0017833$$; approximate 95% CI is $$0.05194 \pm (2 \times 0.0017833)$$, or $$0.0519\pm 0.0358$$, equivalent to 0.016 to 0.088.

The CI is statistically valid.

Answer to Exercise 20.3: Use $$\hat{p} = 708/864 = 0.8194444$$ and $$n = 864$$. Standard error: $$\text{s.e.}(\hat{p}) = 0.01308604$$; approximate 95% CI is $$0.8194444 \pm (2\times 0.01308604)$$.

The CI is statistically valid.

Answer to Exercise 20.4: 1. Approximately $$n = 1/(0.05^2) = 400$$. 2. Approximately $$n = 1/(0.025^2) = 1600$$. 3. To halve the width of the interval, four times as many people are needed.

Answer to Exercise 20.5: After 3000 hours: $$\hat{p} = 0.2143$$; $$\text{s.e.}(\hat{p}) = 0.06331$$. The CI is from 0.088 to 0.341. The statistical validity conditions are satisfied.

After 400 hours: $$\hat{p} = 0$$; $$\text{s.e.}(\hat{p}) = 0$$. The CI is from 0 to 0: clearly silly (implies no sampling variation). This is because the statistical validity conditions are not satisfied.

Answers to exercises in Sect. 21.5.

Answer to Exercise 21.1: The conclusion states that the interval is one in which they are reasonably sure (i.e., 95% sure) that the sample proportion will lie. But the researcher knows exactly what the sample proportion is: it is $$\hat{p} = 0.314$$.

CIs give intervals in which we are reasonably certain that the population value is within, because the population proportion is unknown.

(In addition, the CI is a 68% anyway, not a 95% CI as claimed.)

Answer to Exercise 21.2: The CI is not about individual trees; it is about a population parameter. Presumably, it should read something like 'This CI means that between 22.3% and 40.5% of trees are infected with apple scab'.

## D.21 Answers: CIs for one mean

Answers to exercises in Sect. 22.8.

Answer to Exercise 22.1: Standard error: $$\text{s.e.} = s/\sqrt{n} = 0.43/\sqrt{45} = 0.06410062$$ (keeping lots of decimal places in the working). Approximate 95% CI: $$2.85 \pm(2\times 0.06410062)$$, or $$2.85\pm 0.1282012$$, or from 2.72 litres to 2.98 litres.

Answer to Exercise 22.2: Standard error: $$\text{s.e.} = s/\sqrt{n} = 7571.74/\sqrt{58} = 994.2182$$ (keeping lots of decimal places in the working). Approximate 95% CI is: 4967.984 micrograms to 8944.86 micrograms.

Answer to Exercise 22.3: Approximate 95% CI for the mean brushing time: 29.9 seconds to 36.1 seconds.

Answer to Exercise 22.4: 1. Standard error: $$\text{s.e.}(\bar{x}) = 651.1/\sqrt{199} = 46.15526$$; approximate 95% CI: 754.1ml to 938.7ml. 2. They don't seem very good at estimating (the article reports that the guesses ranged from 50ml to 3000ml). 3. The sample size is much larger than 25; the CI should be statistically valid. 4. Using the margin-of-error as 50, and $$s = 651.1$$:

$\left( \frac{2\times 651.1}{50}\right)^2 = 678.2899.$ We would need about 679 participants (remembering to round up).

5. Using margin-of-error as 25, and $$s = 651.1$$:

$\left( \frac{2\times 651.1}{25}\right)^2 = 2713.16.$ Need about 2714 participants (remembering to round up). 6. To halve the width of the margin of error, four times as many subjects are needed.

Answer to Exercise 22.5: None of these interpretations are acceptable. 1. CIs are not about how individual observations vary; they are about how a statistic varies (in this case, the sample mean). In addition, CIs are about populations and not samples. 2. CIs are not about how individual observations vary; they are about how a statistic varies (in this case, the sample mean. 3. This doesn't make sense: samples can't vary between two values. Sample statistics vary. In addition, CIs are about populations, not samples. 4. This doesn't make sense: populations can't vary between two values. Even population parameters don't vary. 5. The population parameter does not vary. It is a fixed (but unknown) value to be estimated. (If the value of the population mean was constantly changing, it would be very hard to estimate...) 6. We know exactly what the sample mean is ($$\bar{x}=1.3649$$mmol/L: We don't need a interval for the sample mean. 7. We know exactly what the sample mean is ($$\bar{x}=1.3649$$mmol/L: We don't need a interval for the sample mean.

Answer to Exercise 22.6: Neither is correct. To learn about the variation in individuals trees, use the standard deviation rather than the standard error. The standard error tells us about the population mean diameter, not about individual trees.

::: {answer data-latex=""} Answer to Exercise 22.7: $$\text{s.e.}(\bar{x}) = 5.36768$$, so the approximate 95% CI is $$61.3\pm (2 \times 5.36768)$$, which is $$61.3\pm 10.74$$s (or 50.56s to 72.04s). Since $$n = 30$$, which is greater than 25, the CI is likely to be statistically valid. :::

## D.22 Answers: CIs for paired data

Answers to exercises in Sect. 23.13.

Answer to Exercise 23.1: Mean of the differences: 5.2; standard error 3.6. Approximate 95% CI: $$5.2 \pm (2\times 3.06)$$, or $$5.2\pm 6.12$$, from -0.92 to 11.22. Mean taste preference between preferring it better with dip by up to 11.2mm on the 100mm visual analogue scale, or preferring it without dip by a little (up to -0.9mm on the 100mm visual analogue scale. (Understanding how the differences are defined is needed to understand where this came from.)

A useful summary might be like Table D.2.

TABLE D.2: A numerical summary for the brocilli data
Mean Standard deviation Standard error
Raw 56 26.6 2.65
With dip 61.2 28.7 2.86
Differences 5.2 3.06

Answer to Exercise 23.2: 1. Computing differences as Before minus the After measurements seems sensible: the average blood pressure decrease, the purpose of the drug. 2. The differences (when defined as reductions): 9, 4, 21, 3, 20, 31, 17, 26, and so on. 3. Mean difference: 18.933; standard deviation: 9.027; standard error: $$9.027/\sqrt{15} = 2.331$$. Approximate 95% CI: $$14.271$$ to $$23.56$$ mm Hg. 4. Exact 95% CI: 13.934 to 23.93 mm Hg from output. 5. The first uses approximate multipliers. The second uses exact multipliers.

Answer to Exercise 23.3: 1. Approximate 95% CI for reduction: $$0.66 \pm(2\times 0.37)$$, or -0.08 to 1.4: average could be an increase of up to 0.08 to a reduction of up to 1.4 on the given scale for women. 2. Sample size is not larger than 25, but close: probably reasonably statistically valid.

## D.23 Answers: CIs for two means

Answers to exercises in Sect. 24.14.

Answer to Exercise 24.1: 1. Table D.3. 2. From SPSS, exact 95% CI: 0.05438 to 0.11501 (bottom row). Exact 95% CI for the difference between the mean direct HDL cholesterol concentrations: 0.05438 to 0.11501 mm Hg higher for non-smokers.

TABLE D.3: A summary table for the NHANES data; statistics in mm Hg
Mean Standard deviation Standard error Sample size
Non-smokers 1.3924 0.42792 0.01048 1668
Smokers 1.3077 0.42353 0.01137 1388
Differences 0.0847 0.01546

Answer to Exercise 24.2: 1. Placebo group: $$3.62/\sqrt{176} = 0.2728678$$ days; echinacea group $$3.31/\sqrt{183} = 0.2446822$$ days. 2. $$0.53 \pm (2\times 0.367)$$, or -0.204 to 1.264 days. 3. Placebo minus echinacea: the difference between the means show how much longer symptoms last with placebo, compared to echinacea. 4. $$6.34 \pm (2\times 0.2446822)$$, or 5.85 to 6.83 days. 5. Sample sizes are large, so the CIs statistically valid.

The difference between the means is an average of 0.53 days; about half a day (quicker on echinacea). Probably not that important when a cold last for almost seven days.

Answer to Exercise 24.3: 1. Exercise group: $$1.4/\sqrt{10} = 0.4427189$$; splinting group: $$1.1/\sqrt{10} = 0.3478505$$. 2. Splinting minus exercise: the difference are how much greater the pain is with splinting. 3. $$0.3\pm (2\times 0.563$$, or from -0.826 to 1.426: 0.826 greater pain with exercise to 1.426 greater pain with splinting. 4. $$1.1\pm (2\times 0.3478505)$$: from 0.404 to 1.796. 5. Sample sizes are small; CIs may not be statistically valid, roughly correct only.

Answer to Exercise 24.5 1. The parameter could be written as $$\mu_{\text{After}} - \mu_{\text{Before}}$$, the increase is the deceleration. 2. The approximate CI is $$-0.00162$$ to $$0.00562$$ m/s. That is, the difference between the mean decelerations is likely to be somewhere between $$-0.0016$$ m/s (i.e, a mean acceleration of 00016 m/s) to $$0.0056$$ m/s.

## D.24 Answers: CIs for odds ratios

Answers to exercises in Sect. 25.9.

Answer to Exercise 25.1: 1. $$99/62 = 1.596774$$; about 1.60. 2. $$216/115 = 1.878261$$; about 1.88. 3. $$1.596774/1.878261 = 0.850$$, as in the output. 4. A few ways; for example: For every 100 men with a smooth scar, about 85 women with a smooth scar. 5. (Graph not shown, but use a stacked or side-by-side bar chart.) 6. Table D.4. 7. Exact 95% CI for the OR, from the output: 0.576 to 1.255. 8. If study repeated study many times (with the same numbers of men and women), about 95% of the CIs would contain population OR. In practice: population OR is probably between 0.576 and 1.255.

TABLE D.4: The odds and percentage of having smooth scars, for women and men
Odds with smooth scars Percentage with smooth scars Sample size
Women: 1.60 61.5% 161
Men: 1.88 65.3% 331
Odds ratio: 0.850

Answer to 25.3: 1. Table D.5. 2. Table D.6. 3. OR: Odds of a 1800-hr turbine getting a fissure is 0.389 times the odds of a 3000-hr turbine getting a fissure. 4. CI from 0.133 to 1.14. Plausible values for the population OR that may have produced the sample OR likely to be between these values.

Answer to 25.2: The output can be interpreted in one of two ways (Sect. 25.2):

• Odds are the odds of swimming at the beach; OR compares these odds between those without an ear infection, to those with an ear infection.
• Odds are the odds of not having an ear infection; OR compares these odds for beach swimmers to non-beach swimmers.
TABLE D.5: The number of fissures for two sets of turbines, run for different numbers of hours
Fissures No fissures Total
About 1800 hours 7 66 73
About 3000 hours 9 33 42
Total 16 106 122
TABLE D.6: The numerical summary for the fissures data
Odds with fissures Percentage with fissures Sample size
About 1800 hours 0.1061 9.59% 73
About 3000 hours 0.2727 21.43% 42
Odds ratio 0.389

Answer to Exercise 25.4: Odds of no rainfall (non-positive SOI): $$14/40 = 0.35$$. Odds of no rainfall (negative SOI): $$7/53 = 0.1320755$$.

Required OR is $$0.35/0.1320755 = 2.65$$, as in output. 95% CI from 0.979 to 7.174.

Answer to Exercise 25.5: The 95% CI is from 0.151 to 0.408. The OR of not wearing a hat, comparing males to females (malesless likely to be not wearing a hat; rewording, males more likely to be wearing a hat).

## D.25 Answers: Estimating sample sizes

Answers to exercises in Sect. 26.6.

1. Use $$\displaystyle n = \frac{1}{0.04^2} = 625$$; 625 needed.
2. Use $$\displaystyle n = \frac{1}{0.02^2} = 2500$$; 2500 needed. That is four times as many as when the margin-of-error was 0.04.
3. Use $$\displaystyle n = \frac{1}{0.01^2} = 10,00$$; 10,000 needed. That is sixteen times as many as when the margin-of-error was 0.04.
4. To get an estimate half as wide, we need four times as many units of analysis in the sample.
5. To get an estimate a quarter as wide, how many sixteen more units of analysis in the sample.

1. Use $$\displaystyle n = \left(\frac{1}{0.01} \right)^2 = 10000$$; 10,000 needed.
2. Use $$\displaystyle n = \left(\frac{1}{0.02} \right)^2 = 2500$$; 2,500 needed.
3. Use $$\displaystyle n = \left(\frac{1}{0.10} \right)^2 = 100$$; 1000 needed.
4. To have the same give-or-take, but to be more confident that this interval contained the value of $$\mu$$, we would need a larger sample.
5. Probably is expensive (both time and money), so 10,000 and 2500 are both probably unrealistic without lots of funding.

Using $$s = 0.43$$:

1. Use $$\displaystyle n = \left(\frac{2\times 0.43}{0.02} \right)^2 = 1849$$; 1849 needed.
2. Use $$\displaystyle n = \left(\frac{2\times 0.43}{0.05} \right)^2 = 295.85$$; 296 needed.
3. Use $$\displaystyle n = \left(\frac{2\times 0.43}{0.10} \right)^2 = 73.96$$; 74 needed.
4. Probably is expensive (both time and money), so 74 probably the most realistic.

## D.26 Answers: Tests for one mean

Answers to exercises in Sect. 28.14.

Answer to Exercise 28.1: 1. $$H_0$$: $$\mu=7725$$; $$H_1$$: $$\mu\ne7725$$ (two tailed). 2. $$\bar{x} = 6753.64$$ and $$\text{s.e.}(\bar{x}) = s/\sqrt{n} = 1142.123/\sqrt{11} = 344.363$$. 3. $$t = (6753.64 - 7725)/344.363 = -2.821$$, as in output. This 'large'; expect small $$P$$-value; software confirms this: two-tailed $$P=0.018$$. 4. Moderate evidence ($$P = 0.018$$) that the mean energy intake is not meeting the recommended daily energy intake (mean: 6753.6kJ; std. dev.: 1142.1kJ).

Answer to Exercise 28.2: $$H_0$$: $$\mu=120$$ and $$H_1$$: $$\mu\ne 120$$ (two-tailed), where $$\mu$$ is the mean time in seconds. Standard error: $$\text{s.e.}(\bar{x}) = 23.8/\sqrt{85} = 2.581472$$. $$t$$-score: $$(60.3 - 120)/2.581472 = -23.13$$, which is huge; $$P$$-value will be really small.

Very strong evidence ($$P<0.001$$) that children do not spend 2 minutes (on average) brushing their teeth (mean: 60.3s; std. dev.: 23.8s).

Answer to Exercise 28.3: $$H_0$$: $$\mu=50$$ and $$H_1$$: $$\mu>50$$ (one-tailed), where $$\mu$$ is the mean mental demand. Standard error: $$\text{s.e.}(\bar{x}) = 22.05/\sqrt{22} = 4.701076$$. $$t$$-score: $$(84 - 50)/4.701076 = 7.23$$, which is very large; $$P$$-value will be very small.

Very strong evidence ($$P < 0.001$$) that the mean mental demand is greater than 50. (Notice we say greater than, because of the RQ and the alternative hypothesis.)

Answer to Exercise 28.4: Physical: $$t = -1.28$$; Mental: $$t = 1.80$$. The $$P$$-values both larger than 5%.

No evidence that the mean score for patients is different than the general population score.

Answer to Exercise 28.5: $$H_0$$: $$\mu = 12$$ and $$H_1$$: $$\mu \ne 12$$ (two-tailed), where $$\mu$$ is the mean weight in grams. Standard error: $$\text{s.e.}(\bar{x}) = 0.60652/\sqrt{43} = 0.09249343$$. $$t$$-score: $$(14.9577 - 12)/0.09249343 = 31.98$$, which is huge; $$P$$-value will be very small.

Very strong evidence ($$P<0.001$$) that the mean weight of a Fun Size Cherry Ripe bar is not 12 grams (mean: 14.9577; std. dev.: 0.067g), and they may be larger.

Answer to Exercise 28.6: $$H_0$$: $$\mu = 1000$$ and $$H_1$$: $$\mu \ne 1000$$, where $$\mu$$ is the population mean guess of the spill volume. Standard error: 46.15526. $$t$$-score: $$(846.4 - 1000)/46.15526 = -3.33$$, which is very large (and negative), so the $$P$$-value will be very small.

Very strong evidence that the mean guess of blood volume is not 1000,ml, the actual value. The sample is much larger than 25: the test is statistically valid.

Answer to Exercise 28.7: Hypotheses have the form $$H_0$$: $$\mu = \text{pre-determined target}$$, and $$H_1$$: $$\mu \ne \text{pre-determined target}$$. $$t$$-scores: $$t_1 = 0.318$$, $$t_2 = 2.347$$, $$t_3 = -0.466$$, $$t_4 = -0.726$$. $$P$$-values will be large, except for second test.

No evidence that the instruments are dodgy, except perhaps for the first instrument for mid-level LH concentrations. Should be statistically valid.

While assessing the means is useful, how variable the measurements are is also useful (but beyond us).

Answers to exercises in Sect. 29.12.

Answer to Exercise 29.1: Using the 68--95--99.7 rule: 1. Very small; certainly less than 0.003 (99.7% between -3 and 3). 2. Very small; bit bigger than 0.003 (99.7% between -3 and 3). 3. Large-ish: Between 0.32 (68% between 1 and -1) and 5% (95% between -1 and 1), but closer to 0.32. 4. Bit smaller than 0.32 (68% between -1 and 1). 5. Very large! Almost 0.50. 6. Very small; much smaller than 0.003.

Answer to Exercise 29.2: The answers are half the values given to Exercise 29.1. Using the 68--95--99.7 rule: 1. Very small; certainly less than 0.0015 (99.7% between -3 and 3). 2. Very small; bit bigger than 0.0015 (99.7% between -3 and 3). 3. Large-ish: Between 0.16 (68% between 1 and -1) and 2.5% (95% between -1 and 1), but closer to 0.16. 4. Bit smaller than 0.16 (68% between -1 and 1). 5. Very large! Almost 0.25. 6. Very small; much smaller than 0.0015.

Answer to Exercise 29.3: Using the 68--95--99.7 rule, the $$P$$-value is just under 0.05, and hence 'small'. If the $$t$$-score was 0.0499, the $$P$$-value would be just larger than 0.05 and hence 'big'.

The difference between 0.0501 and 0.0499 is trivial though... it is silly to jump from 'evidence supports the alternative hypothesis!' to the complete opposite conclusion 'evidence doesn't support the alternative hypothesis!' over such a minor difference.

Answer to Exercise 29.4: 1. Hypotheses are about population parameters like $$\mu$$, not sample statistics like $$\bar{x}$$. 2. Hypotheses are about parameters like $$\mu$$, not statistics like $$\bar{x}$$. The value of 36.8051 is a sample mean, but hypothesis are meant to be written before the data are collected. In any case, these hypotheses are asking to test if the sample mean is 36.8051... which we know it is. 3. 36.8051 is a sample mean, but hypothesis are meant to be written down before the data are collected. 4. 36.8051 looks like a sample mean, but hypothesis are meant to be written down before the data are collected. 5. Hypotheses are about parameters like $$\mu$$, not statistics like $$\bar{x}$$. 6. This would be fine, if the RQ was one-tailed... but it is two-tailed.

Answer to Exercise 29.5: 1. The conclusion is about the mean energy intake (population mean energy intake specifically). 2. Conclusions are never about sample statistics. We want to know what the statistic that tells us about the population parameter. 3. The conclusion is about the population mean energy intake.

Answer to Exercise 29.6: 1. No evidence of a difference in lifetime between the two brands, as the $$P$$-value is "large". 2. No. The difference is 0.29 hours, or about 17 minutes. A difference of 17 minutes in over 5 hours of use is trivial. 3. Conclusion: no evidence of a difference between the mean lifetimes. That's cumbersome for advertising. A common advertising trick: "No other battery lasts longer!"... meaning there is no evidence of a difference in means. 4. Price!

## D.28 Answers: Tests for paired means

Answers to exercises in Sect. 30.13.

Answer to Exercise 30.1: $$H_0$$: $$\mu_d = 0$$ and $$H_1$$: $$\mu_d>0$$: differences are positive when the dip rating is better than the raw rating. $$t = (5.2 - 0)/3.06 = 1.699$$; the approximate one-tailed $$P$$-value, from using the 68--95--99.7 rule, is somewhere between 16% and 2.5%.

So we cannot be sure if the $$P$$-value is larger than 0.05... but it is likely that it is (since the calculated $$t$$-score is quite a long distance from $$z=1$$). The evidence probably doesn't support the alternative hypothesis.

Answer to Exercise 30.2: 1. Because it is the blood pressure reduction, and a reduction is what the drug is meant to produce, so expect the reductions to be positive numbers. 2. Differences shown below. 3. Histogram of differences: Fig. D.7. 4. $$H_0$$: $$\mu_d = 0$$ and $$H_1$$: $$\mu_d>0$$ (because the differences are reductions). 5. $$t=8.12$$. 6. $$P = 0.001\div 2 = 0.0005$$ (one-tailed test). 7. Very strong evidence ($$P=0.0005$$) that the drug reduces the average systolic blood pressure (mean reduction: 8.6 mm Hg) in the population.

Answer to Exercise 30.3: $$H_0$$: $$\mu_d = 0$$ and $$H_1$$: $$\mu_d > 0$$, where differences are positive when the intention to smoke is reduced after exercise. $$t = (0.66 - 0)/0.37 = 1.78$$; $$P$$-value larger than 0.05: the evidence doesn't support the alternative hypothesis.

No evidence ($$P > 0.05$$) that the mean 'intention to smoke' reduced after exercise in women (mean change in intention to smoke: -0.66; std. error: 0.37).

Answer to Exercise 30.4: $$H_0$$: $$\mu_d = 0$$ and $$H_1$$: $$\mu_d > 0$$, where differences refer to the reduction in ferritin. $$\bar{d} = -424.25$$ and $$s = 2092.693$$ and $$n = 20$$, so $$t = -0.90663$$. $$t$$ is 'small'; $$P > 0.05$$ (actually $$P = 0.376$$): the evidence doesn't support the alternative hypothesis. Since $$n < 25$$, the test may not be statistically valid (the histogram of data (Fig. D.8) suggests that the population might have a normal distribution), though the $$P$$-value is very large so it probably makes little difference.

## D.29 Answers: Tests for two means

Answers to exercises in Sect. 31.13.

Answer to Exercise 31.1: $$H_0$$: $$\mu_S - \mu_{NS} = 0$$ and $$H_1$$: $$\mu_S - \mu_{NS} \ne 0$$. From output: $$t = 5.478$$ and $$P < 0.001$$.

Very strong evidence to support $$H_1$$.

Answer to Exercise 31.2: 1. Table D.7. 2. $$H_0$$: $$\mu_C - \mu_{SO} = 0$$ and $$H_1$$: $$\mu_C - \mu_{SO} \ne 0$$. Then $$t=( (51 - 56) - 0 )/3.3044 = -1.513$$; $$P$$-value larger than 5%. Sample size are small; test may not be statistically valid. 3. $$H_0$$: $$\mu_C - \mu_{SO} = 0$$ and $$H_1$$: $$\mu_C - \mu_{SO} \ne 0$$. Then $$t=( (36 - 47) - 0 )/4.0689 = -2.70$$; $$P$$-value smaller than 5%. Sample size are small; the test may not be statistically valid.

TABLE D.7: The physical profile of conventional and special operation paramedics in Western Australia
Conventional Special Operations
Sample size 11 11
Grip strength (in kg)
Mean 51 56
Standard deviation 8 9
Standard error 1.86 2.71
Push-ups (per minute)
Mean 36 47
Standard deviation 10 11
Standard error 2.36 3.3

Answer to Exercise 31.3: $$H_0$$: $$\mu_M - \mu_{F} = 0$$ and $$H_1$$: $$\mu_M - \mu_{F} \ne 0$$. From output, $$t = -2.285$$; (two-tailed) $$P$$-value is 0.024. Moderate evidence to support $$H_1$$.

Moderate evidence ($$P = 0.024$$) that the mean internal body temperature is different for females (mean: $$36.886^{\circ}\text{C}$$) and males (mean: $$36.725^{\circ}\text{C}$$).

The difference between the means, of 0.16 of a degree, is hardly of any practical importance in everyday use.

Answer to Exercise 31.4: 1. $$H_0$$: The means are equal: $$\mu_I = \mu_{NI}$$ or $$\mu_I - \mu_{NI} = 0$$. $$H_1$$: The means are not equal: $$\mu_I \ne \mu_{NI}$$ or $$\mu_I - \mu_{NI} \ne 0$$. 2. CI from -22.54 to -11.95: the mean sugar consumption between 11.95 and 22.54 kg/person/year greater in industrialised countries. 3. Very strong evidence in the sample ($$P<0.001$$) that the mean annual sugar consumption per person is different for industrialised (mean: 41.8 kg/person/year) and non-industrialised (mean: 24.6 kg/person/year) countries (95% CI for the difference 11.95 to 22.54).

## D.30 Answers: Tests for odds ratios

Answers to exercises in Sect. 32.13.

Answer to Exercise 32.1: The missing entries: Odds: 1.15; Percentage: 58.1%.

$$\chi^2 = 4.593$$; approximately $$z = \sqrt{4.593/1} = 2.14$$; expect small $$P$$-value. Software gives $$P=0.032$$.

Evidence that the difference between the sample proportions is unlikely to be due to sampling variation. The test is statistically valid.

The sample provides moderate evidence ($$\text{chi-square}=4.593$$; two-tailed $$P = 0.032$$) that the population odds of finding a male sandfly in eastern Panama is different at 3 feet above ground (odds: 1.15) compared to 35 feet above ground (odds: 1.71; OR: $$0.67$$; 95% CI from $$0.47$$ to $$0.97$$).

Answer to Exercise 32.2: One option: $$H_0$$: The population OR is one; $$H_1$$: The population OR is not one. From software, $$\chi^2 = 0.667$$; $$P = 0.414$$, which is large.

No evidence ($$P = 0.414$$) that the odds of having a smooth scar is different for women and men (chi-square: 0.667). The test is statistically valid.

Answer to Exercise 32.4: One option: $$H_0$$: The population OR is one; $$H_1$$: The population OR is not one. From software, $$\chi^2 = 3.845$$; $$P = 0.050$$.

Moderate evidence ($$P = 0.05$$) that the odds of having no rainfall is different for non-positive SOI Augusts and negative-SOI Augusts (chi-square: 3.845). The test is statistically valid.

Answer to Exercise 32.5: 1. $$22/366 \times 100 = 6.0$$%. 2. $$79/386 \times 100 = 20.5$$%. 3. $$22/344 = 0.06395349$$, or about 0.0640. 4. $$79/307 = 0.257329$$, or about 0.257. 5. $$0.257/0.0640 = 4.02$$. 6. $$0.0640/0.257= 0.249$$. 7. From $$0.151$$ to $$0.408$$. 8. $$\chi^2 = 33.763$$% (approximately $$z = 5.81$$) and $$P < 0.001$$. 9. Strong evidence ($$P < 0.001$$; $$\chi^2 = 33.763$$; $$n = 752$$) that the odds of wearing hat is different for males (odds: 0.257) and females (odds: 0.0640; OR: 0.249, 95% CI from 0.151 to 0.408). 10. Yes.

Answer to Exercise 32.6: 1. Low exposure (in order): 73.7%, 72.5%, 85.6%. High exposure (in order): 26.3%, 27.5%, 14.4%. 2. In order: 2.80, 2.64, 5.92. 3. Various ways; probably the easiest: $$H_0$$: No association between level of exposure and type of interaction (in the population). 4. Table D.8. 5. Approximately $$z=\sqrt{20.923/2} = 3.23$$: expect small $$P$$-value. 6. Very strong evidence in the sample of an association between level of exposure and type of interaction in the population ($$\chi^2 = 20.923$$; $$P < 0.001$$).

TABLE D.8: The expected counts for the phone-use data
Low exposure 275.7 275.7 272.61
High exposure 81.3 81.3 80.39

## D.31 Answers: Relationships between two quantitative variables

Answers to exercises in Sect. 34.6.

Answer to Exercise 34.1: Linear, positive, very little variation (i.e., strong relationship).

Of note: The observation in the bottom left is very different from the rest of the data, but still maintains the linear relationship.

Answer to Exercise 34.2: Non-linear; higher wind speed related to higher DC output (in general); a small to moderate amount of variation.

The DC output increases as wind speed increases, but not linearly.

Answer to Exercise 34.3: The relationship probably linear... but a few observations at the top right look a bit different.

Variation seems to increase a little as the Age increases.

Of note: A few observations in the top right of the scatterplot seem to not follow the linear relationship.

Answer to Exercise 34.4: Approximately linear; positive relationship; variation seems to get larger for a larger number of cases.

Answers to exercises in Sect. 35.8.

Answer to Exercise 35.2: 1. $$R^2 = 0.881^2 = 77.6$$%. About 77.6% of the variation in punting distance can be explained by the variation in right-leg strength. 2. $$H_0$$: $$\rho = 0$$ and $$H_1$$: $$\rho \ne 0$$. $$P$$-value very small. Very strong evidence of a correlation in the population.

Answer to Exercise 35.3: The plot looks linear; $$n = 25$$; variation doesn't seem constant.

Answer to Exercise 35.4: 1. Very close to $$-1$$. 2. $$r = -\sqrt{0.9929} = -0.9964$$. ($$r$$ must be negative!) 3. Very small. This is a very large value for $$r$$ on a reasonable sized sample. 4. Yes.

Answer to Exercise 35.5: 1. Close to $$-1$$, but not super close. 2. $$r = -\sqrt{0.6707} = -0.819$$. ($$r$$ must be negative!) 3. Very small. This is a large value for $$r$$ on a reasonable sized sample. (The $$P$$-value turns out to be 0.000104.) 4. Since $$n < 25$$, the test may not be statistically valid.

Answers to exercises in Sect. 36.13.

Answer to Exercise 36.1: 1. $$b_0 = 97.499$$ (the intercept); $$b_1=0.0764$$ (the slope). 2. $$\hat{y} = 97.499 + 0.0764x$$, where $$x$$ is the inlet temperature (in $$^\circ$$C) and $$y$$ is removal efficiency (in %). 3. When inlet temperature increases by 1 degree C, on average the removal efficiency increases by 0.076 percentage points. 4. $$H_0$$: $$\beta = 0$$; $$H_1$$: $$\beta \ne 0$$ (two-tailed;RQ implies two-tailed test). 5. $$t = 10.742$$, which is huge; $$P < 0.001$$. 6. $$0.076 \pm (2\times 0.007)$$, or $$0.076 \pm 0.014$$, or $$0.062$$ to $$0.090$$.

Answer to Exercise 36.2: 1. Intercept not about 110; that's where the line 'stops', but the intercept is the predicted value of $$y$$ when $$x = 0$$. We have to extend the line quite a bit. Using rise-over-run, guess slope is $$(190 - 110)/(180 - 110) = 1.14$$. 2. $$\hat{y} = -3.69 + 1.04x$$, where $$y$$ is punting distance (in feet), and $$x$$ is right leg strength (in pounds). 3. For each extra pound of leg strength, the punting distance increases, on average, by about 1 foot. 4. $$H_0$$: $$\beta=0$$; $$H_1$$: $$\beta\ne 0$$. (You could answer in terms of correlations.) The question is stated as a two-tailed question, but testing if stronger legs increase kicking distance seems sensible. 5. $$t = 6.16$$, which is huge; $$P = 0.0001$$ (two-tailed). 6. $$1.0427 \pm (2\times 0.1692)$$, or $$1.0427 \pm 0.3384$$, or 0.70 to 1.4. 7. Very strong evidence in the sample ($$t = 6.16$$; $$P = 0.0001$$ (two-tailed)) that punting distance is related to leg strength (slope: $$1.0427$$; $$n = 13$$).

Answer to Exercise 36.3: 1. Way too many decimal places. $$r$$ is not relevant as relationship is non-linear. 2. Regression is inappropriate: the relationship is non-linear. 3. $$y$$ should be $$\hat{y}$$; the slope and intercept have been swapped (from the plot, the intercept for their line is about 0.4, which they give as the slope). 4. The whole thing is as dodgy-as...

Answer to Exercise 36.4: 1. $$\hat{y} = 17.47 - 2.59x$$, where $$x$$ is the percentage bitumen by weight, and $$y$$ is the percentage air voids by volume. 2. Slope: an increase in the bitumen weight by one percentage point decreases the average percentage air voids by volume by 2.59 percentage points. Intercept: dodgy (extrapolation); in principle 0% bitumen content by weight, the percentage air voids by volume is about 17.47%. 3. $$t = -74.9$$: Massive! Extremely strong evidence ($$P < 0.001$$) of a relationship. 4. $$\hat{y} = 17.4712 - (2.5937\times 5) = 4.5027$$, or about 4.5%. Expected good prediction, as relationship is strong. 5. $$\hat{y} = 17.4712 - (2.5937\times 6) = 1.909$$, or about 1.9%. Might be a poor prediction, since this is extrapolation.

Answer to Exercise 36.5: 1. $$b_0$$: When someone spends no time on sunscreen application, an average of 0.27g has been applied; nonsense. $$b_1$$: Each extra minute spent on application adds an average of 2.21g of sunscreen: sensible. 2. The value of $$\beta_0$$ could be zero... which would make sense. 3. $$\hat{y} = 0.27 + (2.21\times 8) = 17.95$$; an average of about 18g. 4. About 64% of the variation in sunscreen amount applied can be explained by the variation in the time spent on application. 5. $$r = \sqrt{0.64} = 0.8$$, and need a positive value of $$r$$. A strong and positive correlation between the variables.

Answer to Exercise 36.6: 1. No. 2. Possibly; no idea of accuracy of predictions really. 3. Intercept: Weight of infant with chest circumference zero; silly. Slope: average increase in birth weight (in g) for each increase in chest circumference by one cm. 4. Intercept: cm; slope: cm/gram. 5. $$\hat{y} = 2538.7$$g. 6. Too many decimal places! Regression equation implies predicting to 0.0001 of a gram. $$r$$ has too many decimal places too.

Answers to exercises in Sect. 37.4.

Answer to Exercise 37.1: 1. Not ecologically valid. 2. Ethical. People understand that sometimes unexpected things happen. 3. Convenience; self-selected. However, nothing obvious to suggest the people in the study would record different accuracies than people not in the study. 4. Inclusion criteria. 5. Paired $$t$$-test. 6. Evidence in the sample that the mean difference in step-count between the two methods cannot be explained by chance: likely is a difference. 7. From the given information: Probably valid.

Answer to Exercise 37.2: 1. Students at that university (QUMS). 2. A random sampling method has been used, so results should be generalisable to the population (students at that university). 3. $$t$$-test comparing two means. 4. Three groups. Null hypothesis: the population mean hearing loss score is the same in all three groups. Alternative hypothesis: the mean hearing loss score is not the same in all three groups. 5. Standard error: $$\text{s.e.}(\bar{x}) = 3.08/\sqrt{745} = 0.1128$$; CI is $$19.8 \pm 0.26$$. 6. Need the standard error for the difference between two means, which is not reported.

Answers to exercises in Sect. 38.19.

The number of decimal places given (ten!) is ridiculous.

Answer to Exercise 38.2: The graph: odd colour choice; vertical axis label isn't helpful; horizontal axis isn't labelled at all; units of measurement not given; title and/or caption would be helpful.

The table: the two limits of the CI are under the Mean and Std dev columns, but that is not what they are; units of measurement are not given; no caption, or any way to know what the table is about; number of decimal places is inconsistent; sample sizes not given; difference, and probably the other rows too, should report a standard error.

Answer to Exercise 38.3: RQ: P, O, C and I are not clear or explicit; the two fonts being compared should be identified. Perhaps better:

For students, is the mean reading speed for text in the Georgia font the same as for text in Calibri font?

Abstract: statement poorly constructed (fonts are not fast or slow!). Perhaps:

The sample provided evidence that the mean reading speeds were different ($$P = ???$$), when comparing text in Georgia font (mean: ???) and Calibri font (mean: ???; 95% CI for the difference: ??? to ???).

Answer to Exercise 38.4: No units of measurement given (they are centimetres.); jump-heights are given to 0.001 of a centimetre... which seems rather optimistic; table gives information for the differences, which is great but it could also provide numerical summary information for each individual jump type too; a numerical summary shouldn't include a $$P$$-value, $$t$$-score, or confidence interval.

Answer to Exercise 38.5: Variables are qualitative, so means inappropriate; the appropriate summary is an odds ratio, so the values almost certainly refer to the CI for the OR. Without more information, we can't really be sure what the OR means though.

Answer to Exercise 38.6: Such a study cannot prove anything just by itself (only a sample studied); the two CIs for the hang time for each plane design is fine... but really, the difference is of interest: The appropriate CI is for the difference between the mean hang times.

Answer to Exercise 38.7: 1. Table: Reasonably good! 2. Figure: Poor (it is 3D). Use a stacked or side-by-side bar chart.

Answer to Exercise 38.8: Reasonably good: should not be gaps between the bars of the histogram.