36 Tests for comparing odds

So far, you have learnt to ask a RQ, design a study, describe and summarise the data, understand the decision-making process and to work with probabilities. You have been introduced to the construction of confidence intervals, and to hypothesis testing. In this chapter, you will learn to:

  • conduct hypothesis tests for an OR (i.e., comparing two proportions, or comparing two odds), using chi-square tests in software output.
  • determine whether the conditions for using these methods apply in a given situation.

36.1 Introduction: meals on-campus

In Sect. 29.1, a study was introduced that examined the eating habits of university students (Mann and Blotnicky 2017). Researchers classified \(n = 183\) students into groups according to two qualitative variables (Table 36.1): where they lived, and where they ate most of their meals.

Every cell in the \(2\times 2\) table contain different students, so the comparison is between individuals.

TABLE 36.1: Where university students live and eat
Lives with parents Doesn't live with parents Total
Most meals off-campus \(52\) \(105\) \(157\)
Most meals on-campus \(\phantom{0}2\) \(\phantom{0}24\) \(\phantom{0}26\)
Total \(54\) \(129\) \(183\)

Since both qualitative variables have two levels, the table is a \(2\times 2\) table. A graphical summary is shown in Fig. 29.1 (left panel), and a numerical summary in Table 36.2. (The details of the computations appear in Sect. 29.1).

TABLE 36.2: The odds and percentage of university students eating most meals off-campus
Odds of having most meals off-campus Percentage having most meals off-campus Sample size
Living with parents \(26.000\) \(96.3\) \(\phantom{0}54\)
Not living with parents \(\phantom{0}4.375\) \(81.4\) \(129\)
Odds ratio \(\phantom{0}5.943\)

The parameter is the population OR of odds of eating most meals off-campus, comparing students living with their parents, to students not living with their parents.

Understanding how software computes the odds ratio is important for understanding the output. In a \(2\times 2\) table, the jamovi output can be interpreted in either of these ways (i.e., both are correct):

  • The odds compare Row 1 counts to Row 2 counts, for both columns.
    Then the odds ratio compares the odds for Column 1 to the odds for Column 2.

  • The odds compare Column 1 counts to Column 2 counts.
    Then the odds ratio compares the odds for Row 1 to the odds for Row 2.

Odds and odds ratios are computed with the last row and last column values on the bottom of the fraction.

Example 36.1 (Odds and odds ratio in software) For the data in Table 36.1, the software output can be interpreted in either of these ways (i.e., both are correct):

  • The odds are the odds of eating most meals off-campus (Row 1) compared to on-campus (Row 2; on the bottom of the fraction)

    • for students living with their parents (Column 1): \(52/2 = 26\);
    • for students not living with their parents (Column 2): \(105/24 = 4.375\).

    So the OR is \(26/4.375 = 5.943\) (Column 2 on bottom of the fraction), as in the output (Fig. 36.1).

  • The odds are the odds of living with parents (Column 1) compared to not living with parents (Column 2; on the bottom of the fraction):

    • for those eating most meals off-campus: \(52/105 = 0.49524\);
    • for those eating most meals on-campus: and \(2/24 = 0.083333\).

    So the OR is \(0.49524/0.083333 = 5.943\) (Row 2 on bottom of the fraction), as in the output (Fig. 36.1).

In other words, the odds and odds ratios use the last row or last column on the bottom of the fraction.

The RQ can be written using proportions, odds, or odds ratios. Means are not appropriate (the data contain two qualitative variables.) Using the OR, the RQ could be written as

Is the population odds ratio of eating most meals off-campus, comparing students who live with their parents to students not living with their parents, equal to one?

Alternatively, and probably easier to understand, is to write the RQ in terms of comparing the odds in the two groups:

Are the population odds of students eating most meals off-campus the same for students living with their parents and for students not living with their parents?

Equivalent, the RQ can also be worded as comparing the percentage (or proportion) of students eating meals off-campus in each group, though this is less common. However, these are not directly related to the software output (which works with odds ratios). Another alternative, which sounds less direct but is useful for two-way tables larger than \(2\times 2\) (see Sect. 36.9), is worded in terms of relationships or associations (but not correlations) between the variables:

Is there a relationship (or association) between where students eat most of their meals and whether or not the student lives with their parents?

All of these are equivalent. Usually, for \(2 \times2\) tables, working with odds or odds ratios is best, because most software (including jamovi) readily produce output for the OR.

36.2 Statistical hypotheses and notation

For \(2\times 2\) tables of counts, the parameter is the population odds ratio. As usual, the null hypothesis is the 'no difference, no change, no relationship' position:

  • \(H_0\): The population OR is one; or (equivalently):
    The population odds are the same in each group.

This hypothesis proposes that the sample odds are not the same only due to sampling variation. This is the initial assumption. The alternative hypothesis is

  • \(H_1\): The population OR is not one; or (equivalently):
    The population odds are not the same in each group.

For analysing two-way tables of counts, the alternative hypotheses are always two-tailed.

The hypotheses can also be written in terms of differences in percentages (or proportions), though the software output is usually expressed in terms of odds. The hypotheses can also be written in terms of relationships or associations:

  • \(H_0\): In the population, there is no association between the two variables
  • \(H_1\): In the population, there is an association between the two variables

The RQ and hypotheses only need to be given in one of these ways. The RQ and hypotheses should be consistent; for example, if the RQ is written in terms of odds, the hypotheses should be written in terms of odds.

In our example then:

  • \(H_0\): The population odds of eating most meals off-campus are the same for students living with their parents and for students not living with their parents.

  • \(H_1\): The population odds of eating most meals off-campus are different for students living with their parents and for students not living with their parents.

As usual, the decision-making process starts by assuming the null hypothesis is true: that the population odds ratio is one (i.e., the population odds in each group are equal).

36.3 Finding expected counts

Assuming that the odds of having most meals off-campus is the same for both groups (that is, the population OR is one), how would the sample OR be expected to vary from sample to sample just because of sampling variation? If the null hypothesis is true, the odds are the same in both groups (and the percentages are the same in both groups). That is, the percentage of students eating most meals off-campus is the same for students living with and not living with their parents.

Let's consider the implication. From Table 36.1, \(157\) students out of \(183\) ate most meals off-campus, so that \(157\div 183 \times 100 = 85.79\)% of the students in the entire sample ate most of their meals off-campus.

If the percentage of students who eat most of their meals off-campus is the same for those who live with their parents and those who don't, then we'd expect \(85.79\)% of students in both groups to be eating most meals off-campus. (These were also found in Sect. 29.5.) That is, we would expect:

  • \(85.79\)% of the \(54\) students who live with their parents (i.e., \(46.33\)) to eat most meals off-campus; and
  • \(85.79\)% of the \(129\) students who don't live with their parents (i.e., \(110.67\)) to eat most meals off-campus.

In other words, the percentage (and hence the odds) is the same in each group. Those are the expected counts if the percentage was exactly the same in each group (Table 36.3), if the null hypothesis (the assumption) was true.

How close are the observed counts (Table 36.1) to the expected counts (Table 36.3)?

  • \(46.33\) of the \(54\) students who live with their parents are expected to eat most meals off-campus; yet we observed \(52\).
  • \(110.67\) of the \(129\) students who don't live with their parents are expected to eat most meals off-campus; yet we observed \(105\).

The observed and expected counts are similar, but not the exactly same. The difference between the observed and expected counts may be explained by sampling variation (that is, the null hypothesis explanation).

You do not have to compute the expected values when you answer one of these types of RQs (software does it in the background). However, seeing how the decision-making process works in this context is helpful.

In previous hypothesis tests, the sampling distribution had an approximate normal distribution (whose standard deviation is called the standard error). However, the sampling distribution of the odds ratio is more complicated13 so will not be presented. We will use software output instead.

TABLE 36.3: Where university students live and eat: Expected counts
Lives with parents Doesn't live with parents Total
Most meals off-campus \(46.328\) \(110.672\) \(157\)
Most meals on-campus \(\phantom{0}7.672\) \(\phantom{0}18.328\) \(\phantom{0}26\)
Total \(54.000\) \(129.000\) \(183\)

Consider the expected counts in Table 36.3. Confirm that the odds of having most meals off-campus is the same for students living with their parents, and for students not living with their parents.

  • Living with parents: 46.328/7.672 = 6.039$.
  • Not living with parents: 110.672/18.328 = 6.038.

The odds are the same (the small difference is because the expected counts are only given to three decimal places).

36.4 Computing the value of the test statistic

The decision-making process compares what is expected if the null hypothesis about the parameter is true (Table 36.3) to what is observed in the sample (Table 36.1). Previously, when the summary statistics were means, the sampling distribution was a normal distribution, and a \(t\)-score was the test statistic. However, the data here are not summarised by means, the sampling distribution is not a normal distribution (but is related to a normal distribution), and a different test statistic is needed.

Here, the test-statistic is a 'chi-squared' statistic, written \(\chi^2\). A \(\chi^2\) statistic measures the overall size of the differences between the expected counts and observed counts, over the entire \(2\times 2\) table.

The Greek letter \(\chi\) is pronounced 'ki', as in kite (not "chi" as in China).
The test statistic \(\chi^2\) is pronounced as 'chi-squared'.

From the software (Fig. 36.1), \(\chi^2 = 6.934\). What does this value mean? The \(\chi^2\)-value is better understood by finding the equivalent \(z\)-score, which allows a \(P\)-value to be estimated using the \(68\)--\(95\)--\(99.7\) rule. In a \(2\times 2\) table of counts (when the 'degrees of freedom'14, or df, is equal to 1, as in the computer output), the square root of the \(\chi^2\) value is equivalent to a \(z\)-score of about \(\sqrt{6.934} = 2.63\). This is large \(z\)-score, so expect a small \(P\)-value. For two-way tables of any size, a more general (but simple) calculation is needed.

The jamovi output for computing a CI and conducting a test

FIGURE 36.1: The jamovi output for computing a CI and conducting a test

In a chi-squared test, with a given number of 'degrees of freedom' (df in the software output), the value of
\[ \sqrt{ \chi^2 \div {\text{df}}} \] is like a \(z\)-score. This allows the \(P\)-value to be estimated using the \(68\)--\(95\)--\(99.7\) rule.

36.5 Determining \(P\)-values

The differences between the observed sample statistic (the sample OR) and the hypothesised population parameter (the population OR of one) is summarised by \(\chi^2 = 6.934\), approximately equivalent to \(z = 2.63\). Using the \(68\)--\(95\)--\(99.7\) rule, a small \(P\)-value is expected.

The two-tailed \(P\)-value reported by jamovi (Fig. 36.1, under the column p) is indeed small: \(0.008\) to three decimals.

Recall that, for two-way tables of counts, the alternative hypotheses are always two-tailed, so a two-tailed \(P\)-value is always reported.

Click on the hotspots in the following image, and describe what the jamovi output tells us.

36.6 Writing conclusions

As usual, a very small \(P\)-value (\(0.008\) to three decimals) means strong evidence exists to supporting \(H_1\): the evidence suggests a difference in the population odds in the two groups. We write:

The sample provides strong evidence (\(\chi^2 = 6.934\); two-tailed \(P = 0.008\)) that the odds in the population of having most meals off-campus is different for students living with their parents (odds: \(26\)) and students not living with their parents (odds: \(4.375\); OR: \(5.94\); \(95\)% CI from \(1.35\) to \(26.1\)).

The conclusion includes three components (Sect. 33.8): The answer to the RQ; the evidence used to reach that conclusion ('\(\chi^2 = 6.934\); two-tailed \(P = 0.008\)'); and some sample summary statistics (including the \(95\)% CI for the odds ratio).

The conclusion also makes clear what the odds and the odds ratio mean. The odds are describing as the 'odds... of having most meals off-campus', and the OR as then comparing these odds between 'students living with their parents... and students not living with their parents'.

For two-way tables, RQs are best framed in terms of ORs or odds (but can be framed in terms of proportions or percentages, or associations or relationships).

For consistency: if the RQ is about the odds ratio, the hypotheses and conclusion should be about the odds ratio; if the RQ is about odds, the hypotheses and conclusion should be about the odds; and so on.

36.7 Statistical validity conditions

As usual, these results hold under certain conditions. The test above is statistically valid if:

  • All expected counts are at least five.

Some books may give other (but similar) conditions.

The statistical validity condition refers to the expected (not the observed) counts. In jamovi, the expected counts must be explicitly requested to see if this condition is satisfied (Fig. 36.2).

The expected values, as computed in jamovi

FIGURE 36.2: The expected values, as computed in jamovi

For the student-eating data, the smallest observed count is \(2\) (living with parents; most meals off-campus), but the smallest expected count is \(7.67\), which is greater than five. The size of the expected counts is important for the statistical validity condition.

Example 36.2 (Statistical validity) For the university-student eating data, all the cells have an expected count of at least five so the statistical validity condition is satisfied.

36.8 Example: turtle nests

(This study was seen in Sect. 29.6.) The hatching success of loggerhead turtles on Mediterranean beaches is often compromised by fungi and bacteria. A study (Candan, Katılmış, and Ergin 2021) compared the proportion of infected nests relocated nest due to the risk of tidal inundation, and non-relocated nests (Table 36.4). The researchers were interested in knowing:

For Mediterranean loggerhead turtles, are the odds of infections the same for natural and relocated nests?

TABLE 36.4: Infected and non-infected turtle nests
Non-infected Infected
Natural \(29\) \(10\)
Relocated \(14\) \(\phantom{0}8\)

The parameter is the odds ratio of infection, comparing natural to relocated nests. A graphical summary is shown in Fig. 29.3. A numerical summary table (Table 29.3, right table) shows that the odds of natural nest being infected is \(1.657\) times the odds of a relocated nest being infected. From the jamovi output (Fig. 36.3), the \(\chi^2\)-value is \(0.777\); this is like a \(z\)-score of \(z = \sqrt{0.777/1} = 0.88\), which is very small, so expect a large \(P\)-value. Indeed, the \(P\)-value is \(0.378\) on the output. The smallest expected count is \(6.49\) (Fig. 36.3), so this test is statistically valid. We write:

There is no evidence of a difference in the odds of infection (\(\chi^2\): \(0.777\); \(P\)-value: \(0.378\); odds ratio: \(1.657\); \(95\)% CI: \(0.537\) to \(5.12\)) between natural nests (odds: \(2.90\); \(n = 39\)) and relocated nests (odds: \(1.75\); \(n = 22\)).

The jamovi output for the turtle-nesting dataThe jamovi output for the turtle-nesting data

FIGURE 36.3: The jamovi output for the turtle-nesting data

36.9 Example: shopping bags

A study of \(400\) residents of Klang Valley, Malaysia, examined residents' approach to waste management (Choon, Tan, and Chong 2017). One RQ was:

For residents of Klang Valley, is age associated with whether people bring their own bags when shopping?

The data (Table 36.5) are given in a \(3\times 2\) table of counts. The software output is shown in Fig. 36.4; a graphical summary in Fig. 36.5. Most of the numerical summary must be produced manually (Table 36.6), since jamovi only produces odds ratios for \(2\times 2\) tables. Here are the details of the calculations (notice that Row 1 is on the bottom of the fraction):

TABLE 36.5: Whether shoppers bring their own bags, and the shoppers age
Brings own bags Does not bring own bags
30 and under \(126\) \(138\)
31 to 40 \(\phantom{0}50\) \(\phantom{0}32\)
Over 40 \(\phantom{0}41\) \(\phantom{0}13\)
jamovi output for the shopping-bags datajamovi output for the shopping-bags data

FIGURE 36.4: jamovi output for the shopping-bags data

A side-by-side bar chart for the shopping-bags data

FIGURE 36.5: A side-by-side bar chart for the shopping-bags data

TABLE 36.6: Odds and percentage that people bring their own shopping bags by age groups. The odds ratios are computed relative to those 'Over \(40\)'
Odds Odds ratio Percentage Sample size
30 and under \(0.913\) \(0.289\) \(47.7\) \(264\)
31 to 40 \(1.563\) \(0.496\) \(61.0\) \(\phantom{0}82\)
Over 40 \(3.154\) \(75.9\) \(\phantom{0}54\)
  • For those '\(30\) or under': the odds of bringing a shopping bag is \(126/138 = 0.913\).
  • For those '\(31\) to \(40\)': the odds of bringing a shopping bag is \(50/32 = 1.712\).
  • For those 'Over \(40\)': the odds of bringing a shopping bag is \(41/13 = 3.154\).

Then the odds ratios can be computed:

  • The OR of bringing a shopping bag, comparing people '\(31\)--\(40\)' to people 'Over \(40\)': \(0.913/3.154 = 0.289\).
  • The OR of bringing a shopping bag, comparing people 'Over \(40\)' to people 'Over \(40\)': \(1.563/3.154 = 0.496\).

In Table 36.6, the odds of bringing a shopping bag are relative to those 'Over \(40\)' (the last row). Since Table 36.6 has three groups to compare, three odds are needed. However, the summary has \(3 - 1 = 2\) odds ratios, since odds ratios compare pairs of odds. The level to which the other two are compared is called the reference level. In Table 36.6, the reference level is 'Over \(40\)' (i.e., on the bottom of the fraction when computing the odds ratios). (In a \(2\times 2\) table, with two groups to compare, the summary has only \(2 - 1 = 1\) odds ratio.)

These odds ratios mean:

  • The odds of bringing a shopping bag for those '\(30\) and under' is \(0.289\) times the odds of those 'Over \(40\)'; and
  • The odds of bringing a shopping bag for those '\(31\) to \(40\)' is \(0.496\) times the odds of those 'Over \(40\)'.

The hypothesis can be worded in terms of odds:

  • \(H_0\): The odds of bringing a shopping bag is the same for all age groups.
  • \(H_1\): The odds of bringing a shopping bag is not the same for all age groups.

Alternatively, the hypotheses can be worded in terms of relationships or associations (but not correlations) between the two variables:

  • \(H_0\): No association exists between bringing a shopping bag and age group.
  • \(H_1\): An association exists between bringing a shopping bag and age group.

For a \(2\times 2\) table, the parameter is the odds ratio. For two-way tables larger than \(2\times 2\), defining a parameter is difficult; it requires a single number to measure the association between the variables, but we need two ORs to summarise the data. Effectively, the \(\chi^2\) statistic becomes the parameter that measures the size of the difference between all three odds. When no relationship exists in the population, \(\chi^2 = 0\); hence \(H_0:\) \(\chi^2 = 0\). The alternative hypothesis is \(H_1\): \(\chi^2 > 0\); that is, the value of \(\chi^2\) in the sample is not zero due to sampling variation.

From the software output, \(\chi^2 = 16.24\) and \(\text{df} = 2\), so this \(\chi^2\) value is approximately equivalent to a \(z\)-score of \(\sqrt{16.24\div 2} = 2.85\). This is a large \(z\)-score so, using the \(68\)--\(95\)--\(99.7\) rule, a small \(P\)-value is expected; indeed, jamovi reports \(P < 0.001\). This suggests very strong evidence in the sample that bringing a shopping bag is associated with age.

The conclusion could be written as

The sample provides very strong evidence (\(\chi^2 = 16.24\); \(\text{df} = 2\)) that a relationship exists in the population between bringing a shopping bag and age.

While sample summary information could be added to this conclusion, the statements may then become cumbersome. Instead, pointing readers to the numerical summary (Table 36.6) is probably better. Furthermore, CIs are not reported since jamovi does not produce CIs for tables larger than \(2\times 2\).

All expected values exceed \(5\) (Fig. 36.4), so the results are statistically valid.

36.10 Chapter summary

To test a hypothesis about a population odds ratio, based on the value of the sample odds ratio, initially assume the value of the population odds ratio in the null hypothesis (usually one) to be true. Then, expected counts can be computed. Since the sample odds ratio varies from sample to sample, under certain statistical validity conditions, a quantity closely-related to the sample odds ratio varies with an approximate normal distribution. This distribution describes what values of the sample odds ratio could be expected* in the sample if the value of the populations odds ratio in the null hypothesis was true. The test statistic is a \(\chi^2\) statistic, which compares the expected and observed counts.

The value of \(\sqrt{\chi^2/\text{df}}\) is like a \(z\)-score, where 'df' is the 'degrees of freedom' reported by software, and so an approximate \(P\)-value can be estimated using the \(68\)--\(95\)--\(99.7\) rule. Software reports the \(P\)-value to assess whether the data are consistent with the assumption.

36.11 Quick review questions

A study (Egbue, Long, and Samaranayake 2017) of the adoption of electric vehicle (EVs) by a certain group of professional Americans (Example 5.14) compiled the data in Table 36.7. Output from using jamovi is shown in Fig. 36.6.

TABLE 36.7: Responses to the question 'Would you purchase an electric vehicle in the next 10 years?' by education
Yes No
No post-grad \(24\) \(\phantom{0}8\)
Post-grad study \(51\) \(29\)
jamovi output for the EV study

FIGURE 36.6: jamovi output for the EV study

  1. What is the \(\chi^2\) value?
  2. What is the equivalent \(z\)-score (to two decimal places)?
  3. Using the \(68\)--\(95\)--\(99.7\) rule, what is the approximate \(P\)-value?
  4. From the software output, what is the \(P\)-value?
  5. What is the alternative hypothesis?
  6. True or false: There is no evidence of a difference in the odds of buying a car in the next 10 years, between those with and without post-graduate study.

36.12 Exercises

Selected answers are available in App. E.

Exercise 36.1 Researchers (Christensen, Herrer, and Telford 1972) studied the number of sandflies caught in light traps set at 3 and 35 feet above ground in eastern Panama. They asked:

In eastern Panama, are the odds of finding a male sandfly the same at 3 feet above ground as at 35 feet above ground?

The data are compiled into a table (Table 36.8), and summarised numerically (Table 36.9; partially edited) and graphically (Fig. 36.7). Use the jamovi output (Fig. 36.8) to evaluate the evidence, complete Table 36.9, and write a conclusion.

TABLE 36.8: The sex of sandflies at two heights
3 feet
35 feet
above ground above ground
Males \(173\) \(125\)
Females \(150\) \(\phantom{0}73\)
TABLE 36.9: Odds and percentages of male sandflies at two heights above ground level
Odds Percentage Sample size
3 feet: \(298\)
35 feet: \(1.71\) \(67.3\) \(223\)
Odds ratio: \(0.67\)
A side-by-side barchart of the sandflies data

FIGURE 36.7: A side-by-side barchart of the sandflies data

Using jamovi to compute a CI for the sandflies data

FIGURE 36.8: Using jamovi to compute a CI for the sandflies data

Exercise 36.2 [Dataset: ForwardFall] (This study also appeared in Exercise 29.2, where the odds ratio, and the CI for the odds ratio, were computed.) A forward-direction observational study in Western Australia compared the heights of scars from burns received (Wallace et al. 2017). The data are shown in Table 36.10. jamovi was used to analyse the data (Fig. 36.9).

  1. Perform a hypothesis test to determine if the odds of having a smooth scar are the same for women and men.
  2. Write down the conclusion.
  3. Is the test statistically valid?
TABLE 36.10: The number of men and women, with scars of different heights
Women Men
Scar height \(0\) mm (smooth) \(99\) \(216\)
Scar height more than \(0\) mm, less than \(1\) mm \(62\) \(115\)
Using jamovi to compute a CI for the scar-height data

FIGURE 36.9: Using jamovi to compute a CI for the scar-height data

Exercise 36.3 In a study of turbine failures (Myers, Montgomery, and Vining 2002; Nelson 1982), \(73\) turbines were run for around \(1800\) hrs, and seven developed fissures (small cracks). Forty-two different turbines were run for about \(3000\) hrs, and nine developed fissures.

  1. Use the jamovi output (Fig. 36.10, left panel) to test for a relationship.
  2. Compute, then carefully interpret, the OR.
  3. Write down, then carefully interpret, the test results.
  4. Is the CI statistically valid (Fig. 36.10, right panel)?
jamovi output for the turbine data (left); expected counts (right)jamovi output for the turbine data (left); expected counts (right)

FIGURE 36.10: jamovi output for the turbine data (left); expected counts (right)

Exercise 36.4 (This study also appeared in Exercise 29.5.) The Southern Oscillation Index (SOI) is a standardised measure of the air pressure difference between Tahiti and Darwin, and has been shown to be related to rainfall in some parts of the world (Stone, Hammer, and Marcussen 1996), and especially Queensland (Stone and Auliciems 1992; P. K. Dunn 2001).

As an example (P. K. Dunn and Smyth 2018), the rainfall at Emerald (Queensland) was recorded for Augusts between 1889 to 2002 inclusive, in Augusts when the monthly average SOI was positive, and when the SOI was non-positive (that is, zero or negative), as shown in Table 36.11.

  1. Using the jamovi output in Fig. 36.11, perform a hypothesis test to determine if the odds of having no rain is the same Augusts with non-positive and negative SOI.
  2. Write down the conclusion.
  3. Is the test statistically valid?
TABLE 36.11: The SOI, and whether rainfall was recorded in Augusts between 1889 and 2002 inclusive
Non-positive SOI Positive SOI
No rainfall recorded \(14\) \(\phantom{0}7\)
Rainfall recorded \(40\) \(53\)
jamovi output for the Emerald-rain data

FIGURE 36.11: jamovi output for the Emerald-rain data

Exercise 36.5 [Dataset: HatSunglasses] (This study also appeared in Exercise 29.6.) A research study conducted in Brisbane (B. Dexter et al. 2019) recorded the number of people at the foot of the Goodwill Bridge, Southbank, who wore sunglasses and hats. The data were recorded between \(11\):\(30\)am to \(12\):\(30\)pm. Of the \(386\) males observed, \(79\) wore hats; of the \(366\) females observed, \(22\) wore hats.

  1. Compute the percentages of females wearing a hat.
  2. Compute the percentages of males wearing a hat.
  3. Compute the odds of a female wearing a hat.
  4. Compute the odds of a male wearing a hat.
  5. Compute the odds ratio of wearing a hat, comparing females to males.
  6. Compute the odds ratio of wearing a hat, comparing males to females.
  7. Find the \(95\)% CI for the appropriate OR.
  8. Using the jamovi output in Fig. 36.12, perform a hypothesis test to determine if the odds of wearing a hat is the same for females and males.
  9. Write down the conclusion.
  10. Is the test statistically valid?
jamovi output for the hats datajamovi output for the hats data

FIGURE 36.12: jamovi output for the hats data

Exercise 36.6 A study (Lennon, Oviedo-Trespalacios, and Matthews 2017) asked people about their mobile-phone interactions while crossing the road as pedestrians. Part of the data are summarised in Table 36.12.

  1. Compute the column percentages.
  2. Compute the odds of low exposure to each behaviour.
  3. Write the hypothesis for conducting a hypothesis test.
  4. Compute the expected counts.
  5. After analysis in jamovi, the value of \(\chi^2\) is \(20.923\) with two degrees of freedom. What is the approximately-equivalent \(z\)-score? Would you expect a large or small \(P\)-value?
  6. The \(P\)-value is given as \(P < 0.000\). Write a conclusion.
TABLE 36.12: Mobile-phone behaviour of pedestrians. ('Low exposure' means the behaviour was displayed less than once per week; 'High exposure' means the behaviour was displayed one per week or more.)
Answer call Respond to text Reply to email
Low exposure \(263\) \(259\) \(302\)
High exposure \(\phantom{0}94\) \(\phantom{0}98\) \(\phantom{0}51\)

Exercise 36.7 [Dataset: PetBirds] (This study also appeared in Exercise 29.7.) A study examined people with lung cancer, and a matched set of similar controls who did not have lung cancer, and compared the proportion in each group that had pet birds (Kohlmeier et al. 1992). The data are shown again in Table 36.13.

Consider this RQ:

Are the odds of having a pet bird the same for people with lung cancer (cases) and for people without lung cancer (controls)?

  1. Carefully describe the parameter.
  2. Write the hypotheses in terms of odds.
  3. Determine the value of \(z\) that is approximately the same as this \(\chi^2\)-value.
  4. Use the software output to conduct a hypothesis test.
TABLE 36.13: The pet bird data
Adults with lung cancer Adults without lung cancer Total
Did not keep pet birds \(141\) \(328\) \(469\)
Kept pet birds \(\phantom{0}98\) \(101\) \(199\)
Total \(239\) \(429\) \(668\)
jamovi output for the pet-birds data

FIGURE 36.13: jamovi output for the pet-birds data

Exercise 36.8 [Dataset: B12Long] (This study was seen in Exercise 29.8.) A study in New Zealand (Gammon et al. 2012) asked:

Among a certain group of women, are the odds of being vitamin B12 deficient different for women on a vegetarian diet compared to women on a non-vegetarian diet?

The population was 'predominantly overweight/obese women of South Asian origin living in Auckland'. The data are shown in Table 29.11.

  1. Write down the hypotheses in terms of odds.
  2. Write down the parameter.
  3. Determine the \(\chi^2\) value and perform a hypothesis to answer the RQ, using the output in Fig. 36.14.
  4. Compute the equivalent \(z\)-score for this \(\chi^2\)-value.
  5. Write down the conclusion.
  6. Is the test statistically valid?
jamovi output for the B12 data

FIGURE 36.14: jamovi output for the B12 data