25 CIs for odds ratios

So far, you have learnt to ask a RQ, identify different ways of obtaining data, design the study, collect the data describe the data, summarise data graphically and numerically, and understand the tools of inference.

In this chapter, you will learn about confidence intervals for odds ratios. You will learn to:

  • produce confidence intervals for odds ratios using jamovi and SPSS output.
  • determine whether the conditions for using the confidence intervals apply in a given situation.

25.1 Introduction: Odds ratios

A study417 examined the eating habits of university students. One issue studied was the relationship between eating on-campus, and where the student lived.

The researchers cross-classified the \(n=183\) students into groups: each student (the unit of analysis) was observed on two qualitative variables:

  • Where they lived: With their parents, or not with their parents;
  • Whether they ate most meals off-campus, or most meals on-campus.

Since both variables are qualitative, means are not appropriate for summarising the data. However, the students can be classified into a two-way table of counts (Table 25.1), called a contingency table. Both qualitative variables have two levels, so the table is a \(2\times 2\) table.

TABLE 25.1: Where university students live and eat
Lives with parents Doesn't live with parents Total
Most meals off-campus 52 105 157
Most meals on-campus 2 24 26
Total 54 129 183

The purpose of the research is to study the odds (or proportion) of students who eat most meals off-campus, comparing those who live with their parents and those who do not live with their parents?

Notice that the two groups (either students who live with parents or do not live with parents; or students who eat most meals at home or do not) contain different students.

Hence, the comparison here is between individuals.

The parameter of interest could be the difference between the proportions (or percentages) in each group, a comparison between the odds in each group, or the odds ratio.

However, for reasons that we can't delve into, usually the odds ratio (OR) is used as the parameter. One important reason is that software produces output related to the sample OR.

To compare two groups with regard to another qualitative variable, software usually works with odds rather than percentages or proportions.

For this reason, writing the RQ in terms of odds is also most appropriate.

Using the OR, the RQ could be written as

Among university students, what is the odds ratio of students eating most meals off-campus, comparing those who do and do not live with their parents?

The parameter is the population OR, comparing the odds of eating most meals off-campus for students living with their parents to students not living with their parents.

Take care in defining the odds ratios in the parameter!

Recall (Sect. 14.3 that software usually compares Row 1 to Row 2, and Column 1 to Column 2

What are P, O, C and I for this RQ?

A study418 examined the eating habits of university students. One issue they studied was the relationship between eating on-campus, and where the student lived.

In particular, a RQ of interest was:

Among university students, what is the difference in the proportion of students who eat the minority of their meals on campus, between those who live with their parents and those who do not live with their parents?

What graphs would be best for displaying these data?

  • A bar chart.
  • A stacked bar chart.
  • A side-by-side bar chart
  • A scatterplot

25.2 Numerical and graphical summaries: Comparing odds

With two qualitative variables, an appropriate numerical summary includes the odds and percentages from each comparison group for the outcome of interest, and the sample sizes.

From these data, the odds of eating most meals off-campus is:

  • \(52\div 2 = 26\) for students living with their parents.
  • \(105\div24 = 4.375\) for students not living with their parents.

So the odds ratio (OR) of eating most meals off-campus, comparing students living with parents to students not living with parents, is \(26 \div 4.375 = 5.943\).

The numerical summary (Table 25.2) shows the percentage and odds of eating most meals off-campus, comparing students living at home and those not living at home.

Understanding how software computes the odds ratio is very important for understanding the output.

In jamovi and SPSS, the odds ratio can be interpreted in either of these two ways (i.e., both ways are correct):

  • The odds are the odds of eating most meals off-campus (Row 1 of Table 25.1) compared to on-campus (Row 2).
    Then, the odds ratio compares these odds for students living with their parents (Column 1 of Table 25.1) to those not living with their parents (Column 2 of Table 25.1).
    That is, the odds are \(52/2= 26\) (for those living with parents) and \(105/24 = 4.375\) (for those not living with parents), so the OR is then \(26/4.375 = 5.943\), as in the output (Fig. 25.2).

  • The odds are the odds of living with parents (Column 1 of Table 25.1) compared to not living with parents (Column 2).
    Then, the odds ratio compares these odds for students eating most meals off-campus (Row 1 of Table 25.1) to the odds of students eating most meals on-campus (Row 2 of Table 25.1).
    That is, the odds of living with parents are \(52/105 = 0.49524\) (for those eating most meals off-campus) and \(2/24 = 0.083333\) (for those eating most meals on-campus), so the OR is then \(0.49524/0.083333 = 5.943\), as in the output (Fig. 25.2).

In other words, the odds and odds ratios are relative to the first row or first column.

TABLE 25.2: The odds and percentage of university students eating most meals off-campus
Odds of having most meals off-campus Percentage having most meals off-campus Sample size
Living with parents 0.4952 16.6 54
Not living with parents 0.0833 3.8 129
Odds ratio 5.943

An appropriate graph (Fig. 25.1) is a side-by-side bar chart or a stacked bar chart.

For comparing the odds, the side-by-side bar chart is better. (A stacked bar chart is better for comparing proportions, but either is fine.)

A plot of the uni-student eating data: A side-by-side bar chart

FIGURE 25.1: A plot of the uni-student eating data: A side-by-side bar chart

25.3 Sampling distribution: Comparing odds

From the numerical summary table (Table 25.2), the odds of a student eating most meals off-campus is:

  • \(26\) for students living with their parents.
  • \(4.375\) for students not living with their parents.

So the OR of eating most meals off-campus, comparing students living with parents to students not living with parents, is \(26 \div 4.375 = 5.943\). The odds are different in each group, and hence the OR is not one. The OR means that the odds of eating most meals off-campus for students living with their parents is 5.943 times the odds for students living not living with their parents.

Of course, every sample of students is likey to be different, so the OR varies from sample to sample, so there is sampling variation. This means that the odds ratio has a sampling distribution and a standard error.

Unfortunately, the sampling distribution of the sample OR is not a normal distribution419. Fortunately, a simple transformation to the sample OR has a normal distribution. For this reason, we will use software output for finding the CI for the odds ratio, and not discuss the sampling distribution directly.

In other words, we will rely on software to find CIs for odds ratios.

25.4 Confidence intervals: Comparing odds

As noted, we rely on software to find the CI for the odds ratio, such as jamovi (Fig. 25.2) and the second table of the SPSS output (labelled, obscurely, Risk Estimate; Fig. 25.3). Both show that the sample OR is 5.94, and the (exact) 95% CI is from 1.35 to 26.1. (The SPSS output shows other information too, some of which will be useful later.)

The jamovi output for computing a CI

FIGURE 25.2: The jamovi output for computing a CI

The SPSS output for computing  a CI

FIGURE 25.3: The SPSS output for computing a CI

Recall that jamovi and SPSS compute the odds ratio as either

  • 'Odds are Row 1 to Row 2; odds ratio compares Column 1 odds to Column 2 odds', or
  • 'Odds are Column 1 to Column 2; odds ratio compares Row 1 odds to Row 2 odds'.

The OR can be interpreted correctly either way.

We write:

Based on the sample, a 95% CI for the OR comparing the odds of eating most meals off-campus is from 1.35 to 26.1 (living with parents, compared to not living with parents).

This means there is a 95% chance that this CI straddles the population OR.

Notice that the meaning of the OR is explained in the conclusions: the odds of eating most meals off-campus, and comparing students living with parents to not living with parents.

The CI for an OR is not symmetrical, like the others we have seen420.

Interpreting ORs can be confusing, so take care!

Example 25.1 (Crashes in China) A study of car crashes in a rural, mountainous county in western China421 recorded the data in the table below.

Type of crash 2011 2015
Involving pedestrians 15 37
Involving vehicles 35 85

Clearly the number of crashes is larger in 2015. However, the interest is in comparing the odds (or percentage) of crashes involving pedestrians in 2011 and 2015. (Of course, comparing the odds (or percentages) involving vehicles is also possible.)

The data can be summarised as shown below.

Year Percentage involving pedestrians Odds involving pedestrians Sample size
In 2011 30.0 0.429 50
In 2015 30.3 0.435 122
Odds ratio: 0.985

In this table, the odds are the odds that a crash involves a pedestrian.

The odds ratio is the odds of a crash involving pedestrians in 2011, compared to the odds of a crash involving pedestrians in 2015. In this situation, this is the parameter of interest.

Both the percentage and odds columns, and the odds ratio, suggest that the relative proportion of crashes involving pedestrians is very similar in 2011 and 2015.

The odds ratio is 0.986, but this value would change from sample to sample. From software, the 95% CI for the odds ratio is from 0.480 to 2.018. We would write

The population odds ratio for a crash involving pedestrians (comparing 2011 to 2015) has a 95% chance of being between 0.480 and 2.018.

25.5 Statistical validity conditions: Comparing odds

As usual, these results hold under certain conditions. The CI computed above is statistically valid if

  • All expected counts are at least five.

Some books may give other (but similar) conditions.

In addition to the statistical validity condition, the CI will be

The statistical validity condition is a bit tricky to understand (but is explained further in Sect. 32.3). SPSS will let you know if the expected count condition is not met, underneath the first output table in Fig. 25.3.

In jamovi, the expected counts must be explicitly requested to see if this condition is satisfied.

Example 25.2 (Statistical validity) In Fig. 25.3 (for the uni-students data), the text under the first table table of SPSS output (labelled Chi-Square Tests) says

0 cells (0.0%) have expected count less than 5.

That is, all the cells have expected counts of at least five, so the statistical validity condition is satisfied. Notice from Table 25.1 that the observed counts are not all greater than five (one cell has a count of 2). The statistical validity condition is about the expected counts though, not the observed counts.

In jamovi, the expected counts must be requested explicitly (Fig. 25.4), but again none are less than five.

In either case, the conclusion is statistically valid.

The expected counts in jamovi, for the uni-students data

FIGURE 25.4: The expected counts in jamovi, for the uni-students data

Example 25.3 (Car crashes in China) In Example 25.1, all the observed counts are larger than five.

The expected counts are shown below. Since all expected counts are larger than five, the CI will be statistically valid:

Type of crash 2011 2015
Involving pedestrians 15.11 36.88
Involving vehicles 34.88 85.12

These counts are what we would expected to find if there was no relationship between the type of crash in 2011 and 2015; that is, if the proportion of crashes involving pedestrians was the same in 2011 and 2015.

The observed counts are very close to these expected counts, meaning that what we observe is very close to what we expected if there was no relatiionship.

25.6 Example: Pet birds

A study examined people with lung cancer, and a matched set of controls who did not have lung cancer, and compared the proportion in each group that kept pet birds.422 One RQ of the study was:

What is the odds ratio of keeping a pet bird, comparing people with lung cancer (cases) compared to people without lung cancer (controls)?

The parameter is the population OR, comparing the odds of keeping a pet bird, for adults with lung cancer to adults who do not have lung cancer.

The data, compiled in a \(2\times2\) contingency table, are given in Table 25.3.

The numerical summary (Table 25.4) contains percentages, odds and the odds ratios; some of these may need to be computed manually from the data. The graphical summary (Fig. 25.5) shows a difference between the two groups in the sample.

Software computes the CI for the population odds ratio (jamovi: Fig. 25.6; SPSS: Fig. 25.7) based on the sample. The sample OR is 2.257, and the 95% CI is from 1.605 to 3.174.

We write:

Based on the sample, a 95% CI for the OR of keeping a pet bird is from 1.605 to 3.174 (comparing people with lung cancer to those without lung cancer).

That is, the plausible values for the population OR that could have produced the sample OR are between 1.605 and 3.174.

TABLE 25.3: The pet bird data
Adults with lung cancer Adults without lung cancer
Kept pet birds 98 101
Did not keep pet birds 141 328
TABLE 25.4: The odds and percentage of subjects keeping pet birds
Odds of keeping pet bird Percentage keeping pet bird Sample size
With lung cancer: 0.6950 41.0% 238
Without lung cancer: 0.3079 25.5% 429
Odds ratio: 2.26
A plot of the pet-birds data

FIGURE 25.5: A plot of the pet-birds data

jamovi output for the pet-birds data

FIGURE 25.6: jamovi output for the pet-birds data

SPSS output for the pet-birds data

FIGURE 25.7: SPSS output for the pet-birds data

The CI will be statistically valid if the sample is somewhat representative of some population. We see that the text under the first table of SPSS output (Fig. 25.7) indicates that the expected-counts condition is met.

25.7 Example: B12 deficiency

A study in New Zealand423 examined B12 deficiencies in 'predominantly overweight/obese women of South Asian origin living in Auckland', some of whom were on a vegetarian diet and some of whom were on a non-vegetarian diet. One RQ was:

What is the odds ratio of these women being B12 deficient, comparing vegetarians to non-vegetarians?

The parameter is the population OR, comparing the odds of being B12 deficient, for vegetarians to non-vegetarians.

The data appear in Table 25.5. From the jamovi output (Fig. 25.9) or SPSS output (Fig. 25.10), the OR (and 95% CI) is \(3.15\) (\(1.08\) to \(9.24\)). The numerical summary table (Table 25.6) and graphical summary (Fig. 25.8) can hence be constructed.

TABLE 25.5: The number of vegetarian and non-vegetarian women who are (and are not) B12 deficient
B12 deficient Not B12 deficient Total
Vegetarians 8 26 34
Non-vegetarians 8 82 90
Total 16 108 124
A side-by-side barchart comparing the number of women B12 deficient

FIGURE 25.8: A side-by-side barchart comparing the number of women B12 deficient

TABLE 25.6: The odds and percentage of subjects that are B12 deficient
Odds B12 deficient Percentage B12 deficient Sample size
Vegetarians: 0.3077 23.5% 34
Non-vegetarians: 0.0976 8.9% 90
Odds ratio: 3.15
jamovi output for the B12 data

FIGURE 25.9: jamovi output for the B12 data

SPSS output for the B12 data

FIGURE 25.10: SPSS output for the B12 data

To check if these results statistically valid, notice that the text under the first table of SPSS output (Fig.  25.10) says:

1 cells (25.0%) have expected count less than 5. The minimum expected count is 4.39.

This a warning that one expected count is less than 5. Nonetheless, only one cell has an expected count less than five, and only just under 5, so we shouldn't be too concerned about statistical validity (but it should be noted).

We write:

Based on the sample, a 95% CI for the OR of being B12 deficient is from 1.08 to 9.24 (comparing vegetarians to non-vegetarians).

25.8 Quick review questions

A study424 of the adoption of electric vehicle (EVs) by a certain group of professional Americans (Example 5.14) compiled the data in Table 25.7. Output from using jamovi is shown in Fig. 25.11.

TABLE 25.7: Responses to the question 'Would you purchase an electric vehicle in the next 10 years?' by education
Yes No
No post-grad 24 8
Post-grad study 51 29
jamovi output for the EV study

FIGURE 25.11: jamovi output for the EV study

  1. The percentage of people without post-grad study who would buy an EV in the next 10 years is :

  2. The odds that a person without post-grad study would buy an EV in the next 10 years is:

  3. Using the output, what is the OR of buying an electric vehicle in the next 10 years, comparing those without post-grad study to those with post-grad study?

  4. True or false: The CI means that the sample OR is likely to be between 0.68 and 4.28.

  5. True or false: The analysis is likely to be statistically valid?

Progress:

  1. The number without post-grad study: \(24 + 8 = 32\). The percentage of people without post-grad study who would buy an EV in the next 10 years is \(24/32 = 0.75\), or 75%.
  2. The people without post-grad study are in the top row. The odds of people without post-grad study who would buy an EV in the next 10 years is \(24/8 = 3\).
  3. The odds of people without post-grad study who would by an electric vehicle is \(24/8 = 3\).
    The odds of people with post-grad study who would by an electric vehicle is \(51/29 = 1.7586\).
    So the OR is \(3/1.7586 = 1.706\).
  4. Not at all. We know exactly what the sample OR is (it is 1.706). CIs always give an interval in which the population parameter is likely to be within.
  5. The CI is statistically valid if all the expected counts exceed 5. So we don't really know for sure from the given information. But the observed counts are all reasonably large, so it is very probably statistically valid.

25.9 Exercises

Selected answers are available in Sect. D.24.

Exercise 25.1 A prospective observational study in Western Australia425 compared the heights of scars from burns received (Table 25.8).

jamovi was used to analyse the data (Fig. 25.12).

  1. Compute the odds of having a smooth scar (that is, height is 0mm) for women.
  2. Compute the odds of having a smooth scar (that is, height is 0mm) for men.
  3. Compute the odds ratio of having a smooth scar, comparing women to men.
  4. Interpret what this odds ratio means.
  5. Sketch a suitable graph to display the data.
  6. Construct an appropriate numerical summary table for the data.
  7. Write down the CI.
  8. Carefully interpret what this CI means.
jamovi output for the scar-height data

FIGURE 25.12: jamovi output for the scar-height data

TABLE 25.8: The number of men and women, with scars of different heights
Women Men
Scar height 0mm (smooth) 99 216
Scar height more than 0mm, less than 1mm 62 115

Exercise 25.2 A study of ear infections in Sydney swimmers426 recorded whether people reported an ear infection or not, and where they usually swam.

The SPSS output is shown in Fig. 25.13. Explain carefully the meaning of the OR and the corresponding CI.

SPSS output for the ear-infection data

FIGURE 25.13: SPSS output for the ear-infection data

Exercise 25.3 A study of turbine failures427 ran 73 turbines for around 1800 hours, and found that seven developed fissures (small cracks). They also ran a different set of 42 turbines for about 3000 hours, and found that nine developed fissures.

  1. Construct the two-way table for the data.
  2. Use the jamovi output (Fig. 25.14) to construct a 95% CI for the odds ratio.
  3. Compute, then carefully interpret, the OR.
  4. Write down, then carefully interpret, the CI for the OR.
  5. Is the CI likely to be statistically valid (Fig. 25.15)?
jamovi output for the turbine data: output

FIGURE 25.14: jamovi output for the turbine data: output

jamovi output for the turbine data: expected counts

FIGURE 25.15: jamovi output for the turbine data: expected counts

Exercise 25.4 The Southern Oscillation Index (SOI) is a standardised measure of the air pressure difference between Tahiti and Darwin, and is related to rainfall in some parts of the world,428 and especially Queensland.429

The rainfall at Emerald (Queensland) was recorded for Augusts between 1889 to 2002 inclusive,430 where the monthly average SOI was positive, and when the SOI was non-positive (that is, zero or negative), as shown in Table 25.9.

Using the jamovi output in Fig. 25.16:

  1. Find a 95% CI for the OR.
  2. Carefuly explain what this OR means.
TABLE 25.9: The SOI, and whether rainfall was recorded in Augusts between 1889 and 2002 inclusive
Non-positive SOI Positive SOI
No rainfall recorded 14 7
Rainfall recorded 40 53
jamovi output for the Emerald-rain data

FIGURE 25.16: jamovi output for the Emerald-rain data

Exercise 25.5 A research study conducted in Brisbane431 recorded the number of people at the foot of the Goodwill Bridge, Southbank, who wore sunglasses and hats between 11:30am to 12:30pm. Table 25.10 records the number of females and males wearing hats.

Using the SPSS output in Fig. 25.17, find a 95% CI for the OR, and carefully explain what OR this CI applies to. Also, construct the numerical summary table.

TABLE 25.10: The number of people wearing hats, for males and females
No hat Hat
Male 307 79
Female 344 22
SPSS output for the hats data

FIGURE 25.17: SPSS output for the hats data