29 CIs for odds ratios

So far, you have learnt to ask a RQ, design a study, classify and summarise the data, form confidence intervals, and conduct hypothesis tests. In this chapter, you will learn to:

  • form confidence intervals for odds ratios using software output.
  • determine whether the conditions for using the confidence intervals apply in a given situation.

29.1 Introduction: eating habits

Mann and Blotnicky (2017) examined the relationship between where students usually ate, and where the student lived. The researchers cross-classified the \(n = 183\) students (the units of analysis) according to two qualitative variables:

  • Where they lived: with their parents, or not with their parents;
  • Where they ate most meals: off-campus or on-campus.

Since both variables are qualitative, means are not appropriate for summarising the data. A two-way table of counts is appropriate (Table 29.1), called a contingency table. Both qualitative variables have two levels, so the table is a \(2\times 2\) table.

TABLE 29.1: Where university students live and eat
Lives with parents Doesn't live with parents Total
Most meals off-campus \(52\) \(105\) \(157\)
Most meals on-campus \(\phantom{0}2\) \(\phantom{0}24\) \(\phantom{0}26\)
Total \(54\) \(129\) \(183\)

The odds (or proportion) of students who eat most meals off-campus can be compared those who live with their parents and those who do not live with their parents.

Every cell in the \(2\times 2\) table contains different students, so the comparison is between individuals.

The parameter is the odds ratio (OR); specifically, the odds ratio of eating most meals off-campus, comparing those living with parents to those not living with parents. Another sensible parameter would be the difference between the proportions (or percentages) in each group, but the odds ratio usually is used as the parameter (for reason beyond the scope of this book). For this reason, writing the RQ in terms of odds ratios or odds is also most appropriate.

Take care defining the odds ratios in the parameter! Recall (Sect. 13.4.3): software usually compares Row 1 to Row 2, and Column 1 to Column 2. For this reason, it makes sense to define your OR in the same way.

Using the OR, the RQ could be written as:

Among university students, what is the odds ratio of students eating most meals off-campus, comparing those who do and do not live with their parents?

The parameter is the population OR, comparing the odds of eating most meals off-campus for students living with their parents to students not living with their parents.

What are P, O, C and I for this RQ?

29.2 Summarising data

With two qualitative variables, an appropriate numerical summary includes the odds and percentages for the outcome (for each comparison group) and the sample sizes. From these data, the odds of eating most meals off-campus is:

  • \(52\div 2 = 26\) for students living with their parents.
  • \(105\div 24 = 4.375\) for students not living with their parents.

(Notice the last column is always on the bottom of the fraction.) So the odds ratio (OR) of eating most meals off-campus (the first row), comparing students living with parents to students not living with parents, is \(26 \div 4.375 = 5.943\). The numerical summary (Table 29.2) shows the percentage and odds of eating most meals off-campus, comparing students living at home and those not living at home.

Understanding how software computes the odds ratio is very important for understanding the output. In jamovi, the odds ratio can be interpreted in either of these two ways (i.e., both are correct):

  • The odds are the odds of eating most meals off-campus (Row 1 of Table 29.1) compared to on-campus (Row 2): \(52/2 = 26\) (for those living with parents) and \(105/24 = 4.375\) (for those not living with parents).

    Then, the odds ratio compares these odds for students living with their parents (Column 1 of Table 29.1) to those not living with their parents (Column 2): the OR is \(26/4.375 = 5.943\), as in the output (Fig. 29.1, right panel).

  • The odds are the odds of living with parents (Column 1 of Table 29.1) compared to not living with parents (Column 2): \(52/105 = 0.49524\) (for those eating most meals off-campus) and \(2/24 = 0.083333\) (for those eating most meals on-campus).

Then, the odds ratio compares these odds for students eating most meals off-campus (Row 1 of Table 29.1) to the odds of students eating most meals on-campus (Row 2): the OR is \(0.49524/0.083333 = 5.943\), as in the output (Fig. 29.1, right panel).

The odds and odds ratios are relative to the second row or second column.

TABLE 29.2: The odds and percentage of university students eating most meals off-campus
Odds of having most meals off-campus Percentage having most meals off-campus Sample size
Living with parents \(26.000\) \(16.6\) \(\phantom{0}54\)
Not living with parents \(\phantom{0}4.375\) \(\phantom{0}3.8\) \(129\)
Odds ratio \(\phantom{0}5.943\)

An appropriate graph (Fig. 29.1, left panel) is a side-by-side bar chart or a stacked bar chart. For comparing the odds, the side-by-side bar chart is better. (A stacked bar chart is better for comparing proportions, but either is correct.)

The uni-student eating data. Left: A side-by-side bar chart; right: the jamovi output for computing a CIThe uni-student eating data. Left: A side-by-side bar chart; right: the jamovi output for computing a CI

FIGURE 29.1: The uni-student eating data. Left: A side-by-side bar chart; right: the jamovi output for computing a CI

29.3 Describing the sampling distribution

From the numerical summary table (Table 29.2), the odds of a student eating most meals off-campus is \(26\) for students living with their parents, and \(4.375\) for students not living with their parents. So the OR of eating most meals off-campus, comparing students living with parents to students not living with parents, is \(26 \div 4.375 = 5.943\). The odds are different in each group, and hence the OR is not one in the sample: the odds of eating most meals off-campus for students living with their parents is \(5.943\) times the odds for students living not living with their parents.

Of course, every sample of students is likely to be different, so the OR varies from sample to sample; there is sampling variation, so the odds ratio has a sampling distribution and a standard error.

The sampling distribution of the sample OR is not a normal distribution8. Fortunately, a simple transformation to the sample OR does have a normal distribution, though we omit the details. For this reason, we will use software output for finding the CI for the odds ratio, and not discuss the sampling distribution directly. In other words, we will rely on software to find CIs for odds ratios.

29.4 Constructing confidence intervals using software

Using jamovi to find the OR (Fig. 29.1, right panel), the sample OR is \(5.94\) (as computed manually), and the (exact) \(95\)% CI is from \(1.35\) to \(26.1\).

Recall the jamovi output can be interpreted in either of these ways:

  • Odds are Row 1 divided by Row 2.
    Then the odds ratio is computed as Column 1 odds divided by Column 2 odds (i.e., comparing Column 1 odds to Column 2 odds); or
  • Odds are Column 1 divided by Column 2.
    Then the odds ratio is computed as Row 1 odds divided by Row 2 odds (i.e., comparing Row 1 odds to Row 2 odds).

Both are correct, but one is usually easier to understand.

We write:

The OR comparing the odds of eating most meals off-campus, comparing students living with parents (odds: \(26.0\); \(n = 54\)) to students not living with parents (odds: \(4.38\); \(n = 129\)), is \(5.94\), with a \(95\)% CI from \(1.35\) to \(26.1\).

There is a \(95\)% chance that this CI straddles the population OR. Notice that the meaning of the OR is explained in the conclusions: the odds of eating most meals off-campus, and comparing students living with parents to not living with parents.

The CI for an OR is not symmetrical, like the others we have seen9.

Interpreting ORs can be challenging, so care is needed!

29.5 Statistical validity conditions

As usual, these results hold under certain conditions. The CI computed above is statistically valid if

  • All expected counts are at least five.

Some books may give other (but similar) conditions. Note that this condition is based on the expected frequencies, not the observed frequencies. The expected counts are what we would expect to find if there was no relationship between the two variables in the two-way table (Example 19.13).

If there was no relationship between the two variables for the student-meals data, students living with or not with their parents would have a similar percentage of meals eaten off-campus. The overall percentage of students eating meals off-campus is \(157/183\times 100 = 85.79\)% (from Table 29.1). If there was no relationship between the two variables, this percentage would be the same for students living with or not with their parents. In other words, we would expect \(85.79\)% of the \(54\) students who do live with their parents to eat most meals off-campus (which is \(46.33\)), and we would expect \(85.79\)% of the \(129\) students who do not live with their parents to eat most meals off-campus (which is \(110.67\)).

Compute the expected counts for the number of students eating most meals on-campus.

Usually, you do not have to compute these expected values, as software like jamovi can be used to produce the expected counts (see Fig. 29.2). This statistical validity condition is explained further in Sect. 36.3.

Example 29.1 (Statistical validity) For the uni-students eating data, jamovi can be used to compute the expected counts (Fig. 29.2). None are less than five, and so the conclusion is statistically valid. (One observed count is less than five, but this is not relevant to checking for statistical validity.)

The expected counts in jamovi, for the uni-students data

FIGURE 29.2: The expected counts in jamovi, for the uni-students data

29.6 Example: turtle nests

The hatching success of loggerhead turtles on Mediterranean beaches is often compromised by fungi and bacteria. A study (Candan, Katılmış, and Ergin 2021) compared the odds of infected nests between nest relocated due to the risk of tidal inundation, and non-relocated nests (Table 29.3). The researchers were interested in knowing:

For Mediterranean loggerhead turtles, what are the odds of infections comparing natural to relocated nests?

TABLE 29.3: Infected and non-infected turtle nests
Non-infected Infected
Natural \(29\) \(10\)
Relocated \(14\) \(\phantom{0}8\)

The parameter is the odds ratio of infection, comparing natural to relocated nests. The odds ratio can be defined in other ways also, but this definition is consistent with how software computes odds given Table 29.3 (i.e., first row to second row; first column to second column).

A graphical summary is shown in Fig. 29.3. A numerical summary table (Table 29.4) shows that the odds of natural nest being infected is \(1.657\) times the odds of a relocated nest being infected. From the jamovi output (Fig. 29.4), the \(95\)% CI for this odds ratio is from \(0.537\) to \(5.12\). The smallest expected count is \(6.49\) (Fig. 29.4), so this CI is statistically valid. We write:

The OR of an infected nest, comparing natural nests (odds: \(2.90\); \(n = 39\)) to relocated nests (odds: \(1.75\); \(n = 22\)), is \(1.66\) with a \(95\)% CI from \(0.537\) to \(5.12\).

TABLE 29.4: The odds and percentage of infected nests
Odds infected Percentage infected Sample size
Natural \(2.900\) \(74.36\) \(39\)
Relocated \(1.750\) \(63.64\) \(22\)
Odds ratio: \(1.657\)
jamovi output for the EV study

FIGURE 29.3: jamovi output for the EV study

The jamovi output for the turtle-nesting dataThe jamovi output for the turtle-nesting data

FIGURE 29.4: The jamovi output for the turtle-nesting data

29.7 Chapter summary

29.8 Quick review questions

A study (Egbue, Long, and Samaranayake 2017) of the adoption of electric vehicle (EVs) by a certain group of professional Americans compiled the data in Table 29.5. Output from using jamovi is shown in Fig. 29.5.

  1. What percentage of people without post-graduate study would buy an EV in the next \(10\) years? (do not add the percentage symbol)
  2. What are the odds that a person without post-graduate study would buy an EV in the next \(10\) years?
  3. Using the output, what is the OR of buying an electric vehicle in the next \(10\) years, comparing those without post-grad study to those with post-grad study?
  4. True or false: The CI means that the sample OR is likely to be between \(0.68\) and \(4.28\).
  5. True or false: The analysis is statistically valid?
TABLE 29.5: Responses to the question 'Would you purchase an electric vehicle in the next \(10\) years?' by education
Yes No
No post-grad \(24\) \(\phantom{0}8\)
Post-grad study \(51\) \(29\)
jamovi output for the EV study

FIGURE 29.5: jamovi output for the EV study

  1. The number without post-grad study: \(24 + 8 = 32\). The percentage of people without post-grad study who would buy an EV in the next \(10\) years is \(24/32 = 0.75\), or 75%.
  2. The people with post-grad study are in the bottom row. The odds of people without post-grad study who would buy an EV in the next \(10\) years is \(24/8 = 3\).
  3. The odds of people without post-grad study who would by an electric vehicle is \(24/8 = 3\).
    The odds of people with post-grad study who would by an electric vehicle is \(51/29 = 1.7586\).
    So the OR is \(3/1.7586 = 1.706\).
  4. Not at all. We know exactly what the sample OR is (it is \(1.706\)). CIs always give an interval in which the population parameter is likely to be within.
  5. The CI is statistically valid if all the expected counts exceed 5. So we don't really know for sure from the given information. But the observed counts are all reasonably large, so it is very probably statistically valid.

29.9 Exercises

Selected answers are available in App. E.

Exercise 29.1 A study of car crashes in a rural, mountainous county in western China (Wang et al. 2020) recorded the data in Table 29.6.

  1. Produce a numerical summary table for the data.
  2. Compute the odds of crash involving a pedestrian in 2011.
  3. Compute the odds of crash involving a pedestrian in 2015.
  4. Compute the odds ratio of crash involving a pedestrian, comparing 2011 to 2015.
  5. Use the output to write down a CI for the odds ratio.
  6. Write a conclusion.
  7. Use Table 29.7 to determine if the CI is statistically valid.
TABLE 29.6: Types of crashes in different years
2011 2015
Involving pedestrians \(15\) \(37\)
Involving vehicles \(35\) \(85\)
TABLE 29.7: Expected counts of types of crashes in different years
2011 2015
Involving pedestrians \(15.11\) \(36.88\)
Involving vehicles \(34.88\) \(85.12\)

Exercise 29.2 A forward-direction observational study in Western Australia (Wallace et al. 2017) compared the heights of scars from burns received (Table 29.8).

jamovi was used to analyse the data (Fig. 29.6).

  1. Compute the odds of having a smooth scar (that is, height is \(0\) mm) for women.
  2. Compute the odds of having a smooth scar (that is, height is \(0\) mm) for men.
  3. Compute the odds ratio of having a smooth scar, comparing women to men.
  4. Interpret what this odds ratio means.
  5. Sketch a suitable graph to display the data.
  6. Construct an appropriate numerical summary table for the data.
  7. Write down the CI.
  8. Carefully interpret what this CI means.
TABLE 29.8: The number of men and women, with scars of different heights
Women Men
Scar height 0mm (smooth) \(99\) \(216\)
Scar height more than 0mm, less than 1mm \(62\) \(115\)
jamovi output for the scar-height data

FIGURE 29.6: jamovi output for the scar-height data

Exercise 29.3 [Dataset: EarInfection] A study of ear infections in Sydney swimmers (Smyth 2010) recorded whether people reported an ear infection or not, and where they usually swam. The jamovi output is shown in Fig. 29.7. Explain carefully the meaning of the OR and the corresponding CI.

jamovi output for the ear-infection datajamovi output for the ear-infection data

FIGURE 29.7: jamovi output for the ear-infection data

Exercise 29.4 A study of turbine failures (Myers, Montgomery, and Vining 2002; Nelson 1982) ran \(73\) turbines for around \(1800\) hrs, and found that seven developed fissures (small cracks). They also ran a different set of \(42\) turbines for about \(3000\) hrs, and found that nine developed fissures.

  1. Construct the two-way table for the data.
  2. Use the jamovi output (Fig. 29.8) to construct a \(95\)% CI for the odds ratio.
  3. Compute, then carefully interpret, the OR.
  4. Write down, then carefully interpret, the CI for the OR.
  5. Is the CI statistically valid (Fig. 29.8)?
jamovi output for the turbine datajamovi output for the turbine data

FIGURE 29.8: jamovi output for the turbine data

Exercise 29.5 [Dataset: EmeraldAug] The Southern Oscillation Index (SOI) is a standardised measure of the air pressure difference between Tahiti and Darwin, and is related to rainfall in some parts of the world (Stone, Hammer, and Marcussen 1996), and especially Queensland (Stone and Auliciems 1992; P. K. Dunn 2001).

The rainfall at Emerald (Queensland) was recorded for Augusts between 1889 to 2002 inclusive (P. K. Dunn and Smyth 2018), where the monthly average SOI was positive, and when the SOI was non-positive (that is, zero or negative), as shown in Table 29.9.

Using the jamovi output in Fig. 29.9:

  1. Find a \(95\)% CI for the OR.
  2. Carefully explain what this OR means.
TABLE 29.9: The SOI, and whether rainfall was recorded in Augusts between 1889 and 2002 inclusive
Non-positive SOI Positive SOI
No rain \(14\) \(\phantom{0}7\)
Rain \(40\) \(53\)
jamovi output for the Emerald-rain data

FIGURE 29.9: jamovi output for the Emerald-rain data

Exercise 29.6 [Dataset: HatSunglasses] A research study conducted in Brisbane (B. Dexter et al. 2019) recorded the number of people at the foot of the Goodwill Bridge, Southbank, who wore hats between \(11\):\(30\)am to \(12\):\(30\)pm. Of the \(386\) males observed, \(79\) wore hats; of the \(366\) females observed, \(22\) wore hats.

Using the jamovi output in Fig. 29.10, find a \(95\)% CI for the OR, and carefully explain what OR this CI applies to. Also, construct the numerical summary table.

jamovi output for the hats datajamovi output for the hats data

FIGURE 29.10: jamovi output for the hats data

Exercise 29.7 [Dataset: PetBirds] A study examined people with lung cancer, and a matched set of controls who did not have lung cancer, and compared the proportion in each group that kept pet birds (Kohlmeier et al. 1992). One RQ of the study was:

What is the odds ratio of keeping a pet bird, comparing people with lung cancer (cases) compared to people without lung cancer (controls)?

The data, compiled in a \(2\times2\) contingency table, are given in Table 29.10.

  1. Construct a numerical summary table.
  2. Sketch a graphical summary.
  3. Use the software output to find a \(95\)% CI, making to describe the odds ratio carefully.
  4. Is the CI statistically valid?
TABLE 29.10: The pet bird data
Adults with lung cancer Adults without lung cancer
Did not keep pet birds \(141\) \(328\)
Kept pet birds \(\phantom{0}98\) \(101\)
jamovi output for the pet-birds data

FIGURE 29.11: jamovi output for the pet-birds data

The expected (and observed) counts as computed by jamovi for the pet-birds data

FIGURE 29.12: The expected (and observed) counts as computed by jamovi for the pet-birds data

Exercise 29.8 [Dataset: B12Long] A study in New Zealand (Gammon et al. 2012) examined B12 deficiencies in 'predominantly overweight/obese women of South Asian origin living in Auckland', some of whom were on a vegetarian diet and some of whom were on a non-vegetarian diet. One RQ was:

What is the odds ratio of these women being B12 deficient, comparing vegetarians to non-vegetarians?

The data appear in Table 29.11, and the jamovi output in Figs. 29.13 and 29.14.

  1. Construct a numerical summary table.
  2. Sketch a graphical summary.
  3. Use the software output to find a \(95\)% CI, making to describe the odds ratio carefully.
  4. Is the CI statistically valid?
TABLE 29.11: The number of vegetarian and non-vegetarian women who are (and are not) B12 deficient
B12 deficient Not B12 deficient Total
Vegetarians \(\phantom{0}8\) \(\phantom{0}26\) \(\phantom{0}34\)
Non-vegetarians \(\phantom{0}8\) \(\phantom{0}82\) \(\phantom{0}90\)
Total \(16\) \(108\) \(124\)
jamovi output for the B12 data

FIGURE 29.13: jamovi output for the B12 data

The expected counts from jamovi for the B12 data

FIGURE 29.14: The expected counts from jamovi for the B12 data