29 CIs for odds ratios
So far, you have learnt to ask a RQ, design a study, classify and summarise the data, form confidence intervals, and conduct hypothesis tests. In this chapter, you will learn to:
 form confidence intervals for odds ratios using software output.
 determine whether the conditions for using the confidence intervals apply in a given situation.
29.1 Introduction: eating habits
A study examined the relationship between where students ate, and where the student lived (Mann and Blotnicky 2017).. The researchers crossclassified the \(n = 183\) students (the units of analysis) according to two qualitative variables:
 Where they lived: with their parents, or not with their parents;
 Where they ate most meals: offcampus or oncampus.
Since both variables are qualitative, means are not appropriate for summarising the data. A twoway table of counts is appropriate (Table 29.1), called a contingency table. Both qualitative variables have two levels, so the table is a \(2\times 2\) table.
Lives with parents  Doesn't live with parents  Total  

Most meals offcampus  \(52\)  \(105\)  \(157\) 
Most meals oncampus  \(\phantom{0}2\)  \(\phantom{0}24\)  \(\phantom{0}26\) 
Total  \(54\)  \(129\)  \(183\) 
The odds (or proportion) of students who eat most meals offcampus can be compared those who live with their parents and those who do not live with their parents.
Every cell in the \(2\times 2\) table contains different students, so the comparison is between individuals.
The parameter is the odds ratio (OR); specifically, the odds ratio of eating most meals offcampus, comparing those living with parents to those not living with parents. Another sensible parameter would be the difference between the proportions (or percentages) in each group, but the odds ratio usually is used as the parameter (for reason beyond the scope of this book). For this reason, writing the RQ in terms of odds ratios or odds is also most appropriate.
Take care defining the odds ratios in the parameter! Recall (Sect. 13.4.3): software usually compares Row 1 to Row 2, and Column 1 to Column 2. For this reason, it makes sense to define your OR in the same way.
Using the OR, the RQ could be written as:
Among university students, what is the odds ratio of students eating most meals offcampus, comparing those who do and do not live with their parents?
Since the OR is just a comparison of odds, the RQ could be written as:
Among university students, what are the odds of students eating most meals offcampus comparing students who do and do not live with their parents?
Either way, the parameter is the population OR, comparing the odds of eating most meals offcampus for students living with their parents to students not living with their parents.
What are P, O, C and I for this RQ?
29.2 Summarising data
With two qualitative variables, an appropriate numerical summary includes the odds and percentages for the outcome (for each comparison group) and the sample sizes. From these data, the odds of eating most meals offcampus is:
 \(52\div 2 = 26\) for students living with their parents.
 \(105\div 24 = 4.375\) for students not living with their parents.
(Notice the last column is always on the bottom of the fraction.) So the odds ratio (OR) of eating most meals offcampus (the first row), comparing students living with parents to students not living with parents, is \(26 \div 4.375 = 5.943\). The numerical summary (Table 29.2) shows the percentage and odds of eating most meals offcampus, comparing students living at home and those not living at home.
Understanding how software computes the odds ratio is very important for understanding the output. In jamovi, the odds ratio can be interpreted in either of these two ways (i.e., both are correct):

The odds are the odds of eating most meals offcampus (Row 1 of Table 29.1) compared to oncampus (Row 2): \(52/2 = 26\) (for those living with parents) and \(105/24 = 4.375\) (for those not living with parents).
Then, the odds ratio compares these odds for students living with their parents (Column 1 of Table 29.1) to those not living with their parents (Column 2): the OR is \(26/4.375 = 5.943\), as in the output (Fig. 29.1, right panel).
The odds are the odds of living with parents (Column 1 of Table 29.1) compared to not living with parents (Column 2): \(52/105 = 0.49524\) (for those eating most meals offcampus) and \(2/24 = 0.083333\) (for those eating most meals oncampus).
Then, the odds ratio compares these odds for students eating most meals offcampus (Row 1 of Table 29.1) to the odds of students eating most meals oncampus (Row 2): the OR is \(0.49524/0.083333 = 5.943\), as in the output (Fig. 29.1, right panel).
The odds and odds ratios are relative to the second row or second column.
Odds of having most meals offcampus  Percentage having most meals offcampus  Sample size  

Living with parents  \(26.000\)  \(16.6\)  \(\phantom{0}54\) 
Not living with parents  \(\phantom{0}4.375\)  \(\phantom{0}3.8\)  \(129\) 
Odds ratio  \(\phantom{0}5.943\) 
An appropriate graph (Fig. 29.1, left panel) is a sidebyside bar chart or a stacked bar chart. For comparing the odds, the sidebyside bar chart is better. (A stacked bar chart is better for comparing proportions, but either is correct.)
29.3 Describing the sampling distribution
From the numerical summary table (Table 29.2), the odds of a student eating most meals offcampus is \(26\) for students living with their parents, and \(4.375\) for students not living with their parents. So the OR of eating most meals offcampus, comparing students living with parents to students not living with parents, is \(26 \div 4.375 = 5.943\). The odds are different in each group, and hence the OR is not one in the sample: the odds of eating most meals offcampus for students living with their parents is \(5.943\) times the odds for students living not living with their parents.
Of course, every sample of students is likely to be different, so the OR varies from sample to sample; there is sampling variation, so the odds ratio has a sampling distribution and a standard error.
The sampling distribution of the sample OR is not a normal distribution^{8}. Fortunately, a simple transformation to the sample OR does have a normal distribution, though we omit the details. For this reason, we will use software output for finding the CI for the odds ratio, and not discuss the sampling distribution directly. In other words, we will rely on software to find CIs for odds ratios.
29.4 Constructing confidence intervals using software
Using jamovi to find the OR (Fig. 29.1, right panel), the sample OR is \(5.94\) (as computed manually), and the (exact) \(95\)% CI is from \(1.35\) to \(26.1\).
Recall the jamovi output can be interpreted in either of these ways:
 Odds are Row 1 divided by Row 2.
Then the odds ratio is computed as Column 1 odds divided by Column 2 odds (i.e., comparing Column 1 odds to Column 2 odds); or  Odds are Column 1 divided by Column 2.
Then the odds ratio is computed as Row 1 odds divided by Row 2 odds (i.e., comparing Row 1 odds to Row 2 odds).
Both are correct, but one is usually easier to understand.
We write:
The OR comparing the odds of eating most meals offcampus, comparing students living with parents (odds: \(26.0\); \(n = 54\)) to students not living with parents (odds: \(4.38\); \(n = 129\)), is \(5.94\), with a \(95\)% CI from \(1.35\) to \(26.1\).
There is a \(95\)% chance that this CI straddles the population OR. Notice that the meaning of the OR is explained in the conclusions: the odds of eating most meals offcampus, and comparing students living with parents to not living with parents.
The CI for an OR is not symmetrical, like the others we have seen^{9}.
Interpreting ORs can be challenging, so care is needed!
29.5 Statistical validity conditions
As usual, these results hold under certain conditions. The CI computed above is statistically valid if
 All expected counts are at least five.
Some books may give other (but similar) conditions. Note that this condition is based on the expected frequencies, not the observed frequencies. The expected counts are what we would expect to find if there was no relationship between the two variables in the twoway table (see Example 19.13).
If there was no relationship between the two variables for the studentmeals data, students living with or not with their parents would have a similar percentage of meals eaten oncampus. First see that the overall percentage of students eating meals oncampus is \(157/183\times 100 = 85.79\)% (from Table 29.1). If there was no relationship between the two variables, this percentage would be the same for students living with or not with their parents. In other words, we would expect \(85.79\)% of the \(54\) students who do live with their parents to eat most meals off campus (which is \(46.33\)), and we would expect \(85.79\)% of the \(129\) students who do not live with their parents to eat most meals offcampus (which is \(110.67\)).
Compute the expected counts for the number of students eating most meals oncampus.
You do not have to compute these expected values, as software like jamovi can be used to produce the expected counts to check statistical validity (see Fig. 29.2). This statistical validity condition is explained further in Sect. 36.3.
Example 29.1 (Statistical validity) For the unistudents eating data, jamovi can be used to compute the expected counts (Fig. 29.2). None are less than five, and so the conclusion is statistically valid. (One observed count is less than five, but this is not relevant to checking for statistical validity.)
29.6 Example: turtle nests
The hatching success of loggerhead turtles on Mediterranean beaches is often compromised by fungi and bacteria. A study (Candan, Katılmış, and Ergin 2021) compared the odds of infected nests between nest relocated due to the risk of tidal inundation, and nonrelocated nests (Table 29.3). The researchers were interested in knowing:
For Mediterranean loggerhead turtles, what are the odds of infections comparing natural to relocated nests?
Noninfected  Infected  

Natural  \(29\)  \(10\) 
Relocated  \(14\)  \(\phantom{0}8\) 
The parameter is the odds ratio of infection, comparing natural to relocated nests. The odds ratio can be defined in other ways also, but this definition is consistent with how software computes odds given Table 29.3 (i.e., first row to second row; first column to second column).
A graphical summary is shown in Fig. 29.3. A numerical summary table (Table 29.4) shows that the odds of natural nest being infected is \(1.657\) times the odds of a relocated nest being infected. From the jamovi output (Fig. 29.4), the \(95\)% CI for this odds ratio is from \(0.537\) to \(5.12\). The smallest expected count is \(6.49\) (Fig. 29.4), so this CI is statistically valid. We write:
The OR of an infected nest, comparing natural nests (odds: \(2.90\); \(n = 39\)) to relocated nests (odds: \(1.75\); \(n = 22\)), is \(1.66\) with a \(95\)% CI from \(0.537\) to \(5.12\).
Odds infected  Percentage infected  Sample size  

Natural  \(2.900\)  \(74.36\)  \(39\) 
Relocated  \(1.750\)  \(63.64\)  \(22\) 
Odds ratio:  \(1.657\) 
29.8 Quick review questions
A study (Egbue, Long, and Samaranayake 2017) of the adoption of electric vehicle (EVs) by a certain group of professional Americans compiled the data in Table 29.5. Output from using jamovi is shown in Fig. 29.5.
 What percentage of people without postgraduate study would buy an EV in the next \(10\) years? (do not add the percentage symbol)
 What are the odds that a person without postgraduate study would buy an EV in the next \(10\) years?
 Using the output, what is the OR of buying an electric vehicle in the next \(10\) years, comparing those without postgrad study to those with postgrad study?
 True or false: The CI means that the sample OR is likely to be between \(0.68\) and \(4.28\).
 True or false: The analysis is likely to be statistically valid?
Yes  No  

No postgrad  \(24\)  \(\phantom{0}8\) 
Postgrad study  \(51\)  \(29\) 
 The number without postgrad study: \(24 + 8 = 32\). The percentage of people without postgrad study who would buy an EV in the next \(10\) years is \(24/32 = 0.75\), or 75%.
 The people with postgrad study are in the bottom row. The odds of people without postgrad study who would buy an EV in the next \(10\) years is \(24/8 = 3\).
 The odds of people without postgrad study who would by an electric vehicle is \(24/8 = 3\).
The odds of people with postgrad study who would by an electric vehicle is \(51/29 = 1.7586\).
So the OR is \(3/1.7586 = 1.706\).  Not at all. We know exactly what the sample OR is (it is \(1.706\)). CIs always give an interval in which the population parameter is likely to be within.
 The CI is statistically valid if all the expected counts exceed 5. So we don't really know for sure from the given information. But the observed counts are all reasonably large, so it is very probably statistically valid.
29.9 Exercises
Selected answers are available in App. E.
Exercise 29.1 A study of car crashes in a rural, mountainous county in western China (Wang et al. 2020) recorded the data in Table 29.6.
 Produce a numerical summary table for the data.
 Compute the odds of crash involving a pedestrian in 2011.
 Compute the odds of crash involving a pedestrian in 2015.
 Compute the odds ratio of crash involving a pedestrian, comparing 2011 to 2015.
 Use the output to write down a CI for the odds ratio.
 Write a conclusion.
 Use Table 29.7 to determine if the CI is statistically valid.
2011  2015  

Involving pedestrians  \(15\)  \(37\) 
Involving vehicles  \(35\)  \(85\) 
2011  2015  

Involving pedestrians  \(15.11\)  \(36.88\) 
Involving vehicles  \(34.88\)  \(85.12\) 
Exercise 29.2 A forwarddirection observational study in Western Australia (Wallace et al. 2017) compared the heights of scars from burns received (Table 29.8).
jamovi was used to analyse the data (Fig. 29.6).
 Compute the odds of having a smooth scar (that is, height is \(0\) mm) for women.
 Compute the odds of having a smooth scar (that is, height is \(0\) mm) for men.
 Compute the odds ratio of having a smooth scar, comparing women to men.
 Interpret what this odds ratio means.
 Sketch a suitable graph to display the data.
 Construct an appropriate numerical summary table for the data.
 Write down the CI.
 Carefully interpret what this CI means.
Women  Men  

Scar height 0mm (smooth)  \(99\)  \(216\) 
Scar height more than 0mm, less than 1mm  \(62\)  \(115\) 
Exercise 29.3 [Dataset: EarInfection
]
A study of ear infections in Sydney swimmers (Smyth 2010) recorded whether people reported an ear infection or not, and where they usually swam.
The jamovi output is shown in Fig. 29.7.
Explain carefully the meaning of the OR and the corresponding CI.
Exercise 29.4 A study of turbine failures (Myers, Montgomery, and Vining 2002; Nelson 1982) ran \(73\) turbines for around \(1800\) hrs, and found that seven developed fissures (small cracks). They also ran a different set of \(42\) turbines for about \(3000\) hrs, and found that nine developed fissures.
Exercise 29.5 [Dataset: EmeraldAug
]
The Southern Oscillation Index (SOI) is a standardised measure of the air pressure difference between Tahiti and Darwin, and is related to rainfall in some parts of the world (Stone, Hammer, and Marcussen 1996), and especially Queensland (Stone and Auliciems 1992; P. K. Dunn 2001).
The rainfall at Emerald (Queensland) was recorded for Augusts between 1889 to 2002 inclusive (P. K. Dunn and Smyth 2018), where the monthly average SOI was positive, and when the SOI was nonpositive (that is, zero or negative), as shown in Table 29.9.
Using the jamovi output in Fig. 29.9:
 Find a \(95\)% CI for the OR.
 Carefully explain what this OR means.
Nonpositive SOI  Positive SOI  

No rain  \(14\)  \(\phantom{0}7\) 
Rain  \(40\)  \(53\) 
Exercise 29.6 [Dataset: HatSunglasses
]
A research study conducted in Brisbane (B. Dexter et al. 2019) recorded the number of people at the foot of the Goodwill Bridge, Southbank, who wore hats between \(11\):\(30\)am to \(12\):\(30\)pm.
Of the \(386\) males observed, \(79\) wore hats; of the \(366\) females observed, \(22\) wore hats.
Using the jamovi output in Fig. 29.10, find a \(95\)% CI for the OR, and carefully explain what OR this CI applies to. Also, construct the numerical summary table.
Exercise 29.7 [Dataset: PetBirds
]
A study examined people with lung cancer, and a matched set of controls who did not have lung cancer, and compared the proportion in each group that kept pet birds (Kohlmeier et al. 1992).
One RQ of the study was:
What is the odds ratio of keeping a pet bird, comparing people with lung cancer (cases) compared to people without lung cancer (controls)?
The data, compiled in a \(2\times2\) contingency table, are given in Table 29.10.
 Construct a numerical summary table.
 Sketch a graphical summary.
 Use the software output to find a \(95\)% CI, making to to describe the odds ratio carefully.
 Is the CI likely to be statistically valid?
Adults with lung cancer  Adults without lung cancer  

Did not keep pet birds  \(141\)  \(328\) 
Kept pet birds  \(\phantom{0}98\)  \(101\) 
Exercise 29.8 [Dataset: B12Long
]
A study in New Zealand (Gammon et al. 2012) examined B12 deficiencies in 'predominantly overweight/obese women of South Asian origin living in Auckland', some of whom were on a vegetarian diet and some of whom were on a nonvegetarian diet.
One RQ was:
What is the odds ratio of these women being B12 deficient, comparing vegetarians to nonvegetarians?
The data appear in Table 29.11, and the jamovi output in Figs. 29.13 and 29.14.
 Construct a numerical summary table.
 Sketch a graphical summary.
 Use the software output to find a \(95\)% CI, making to to describe the odds ratio carefully.
 Is the CI likely to be statistically valid?
B12 deficient  Not B12 deficient  Total  

Vegetarians  \(\phantom{0}8\)  \(\phantom{0}26\)  \(\phantom{0}34\) 
Nonvegetarians  \(\phantom{0}8\)  \(\phantom{0}82\)  \(\phantom{0}90\) 
Total  \(16\)  \(108\)  \(124\) 