25 CIs for odds ratios
So far, you have learnt to ask a RQ, identify different ways of obtaining data, design the study, collect the data describe the data, summarise data graphically and numerically, and understand the tools of inference.
In this chapter, you will learn about confidence intervals for odds ratios. You will learn to:
- produce confidence intervals for odds ratios using jamovi and SPSS output.
- determine whether the conditions for using the confidence intervals apply in a given situation.
25.1 Introduction: Odds ratios
A study^{417} examined the eating habits of university students. One issue studied was the relationship between eating on-campus, and where the student lived.
The researchers cross-classified the \(n=183\) students into groups: each student (the unit of analysis) was observed on two qualitative variables:
- Where they lived: With their parents, or not with their parents;
- Whether they ate most meals off-campus, or most meals on-campus.
Since both variables are qualitative, means are not appropriate for summarising the data. However, the students can be classified into a two-way table of counts (Table 25.1), called a contingency table. Both qualitative variables have two levels, so the table is a \(2\times 2\) table.
Lives with parents | Doesn't live with parents | Total | |
---|---|---|---|
Most meals off-campus | 52 | 105 | 157 |
Most meals on-campus | 2 | 24 | 26 |
Total | 54 | 129 | 183 |
The purpose of the research is to study the odds (or proportion) of students who eat most meals off-campus, comparing those who live with their parents and those who do not live with their parents?
Notice that the two groups (either students who live with parents or do not live with parents; or students who eat most meals at home or do not) contain different students.
Hence, the comparison here is between individuals.
The parameter of interest could be the difference between the proportions (or percentages) in each group, a comparison between the odds in each group, or the odds ratio.
However, for reasons that we can't delve into, usually the odds ratio (OR) is used as the parameter. One important reason is that software produces output related to the sample OR.
To compare two groups with regard to another qualitative variable, software usually works with odds rather than percentages or proportions.
For this reason, writing the RQ in terms of odds is also most appropriate.
Using the OR, the RQ could be written as
Among university students, what is the odds ratio of students eating most meals off-campus, comparing those who do and do not live with their parents?
The parameter is the population OR, comparing the odds of eating most meals off-campus for students living with their parents to students not living with their parents.
Take care in defining the odds ratios in the parameter!
Recall (Sect. 14.3 that software usually compares Row 1 to Row 2, and Column 1 to Column 2
What are P, O, C and I for this RQ?
A study^{418} examined the eating habits of university students. One issue they studied was the relationship between eating on-campus, and where the student lived.
In particular, a RQ of interest was:
Among university students, what is the difference in the proportion of students who eat the minority of their meals on campus, between those who live with their parents and those who do not live with their parents?
What graphs would be best for displaying these data?
- A bar chart.
- A stacked bar chart.
- A side-by-side bar chart
- A scatterplot
25.2 Numerical and graphical summaries: Comparing odds
With two qualitative variables, an appropriate numerical summary includes the odds and percentages from each comparison group for the outcome of interest, and the sample sizes.
From these data, the odds of eating most meals off-campus is:
- \(52\div 2 = 26\) for students living with their parents.
- \(105\div24 = 4.375\) for students not living with their parents.
So the odds ratio (OR) of eating most meals off-campus, comparing students living with parents to students not living with parents, is \(26 \div 4.375 = 5.943\).
The numerical summary (Table 25.2) shows the percentage and odds of eating most meals off-campus, comparing students living at home and those not living at home.
Understanding how software computes the odds ratio is very important for understanding the output.
In jamovi and SPSS, the odds ratio can be interpreted in either of these two ways (i.e., both ways are correct):
The odds are the odds of eating most meals off-campus (Row 1 of Table 25.1) compared to on-campus (Row 2).
Then, the odds ratio compares these odds for students living with their parents (Column 1 of Table 25.1) to those not living with their parents (Column 2 of Table 25.1).
That is, the odds are \(52/2= 26\) (for those living with parents) and \(105/24 = 4.375\) (for those not living with parents), so the OR is then \(26/4.375 = 5.943\), as in the output (Fig. 25.2).The odds are the odds of living with parents (Column 1 of Table 25.1) compared to not living with parents (Column 2).
Then, the odds ratio compares these odds for students eating most meals off-campus (Row 1 of Table 25.1) to the odds of students eating most meals on-campus (Row 2 of Table 25.1).
That is, the odds of living with parents are \(52/105 = 0.49524\) (for those eating most meals off-campus) and \(2/24 = 0.083333\) (for those eating most meals on-campus), so the OR is then \(0.49524/0.083333 = 5.943\), as in the output (Fig. 25.2).
In other words, the odds and odds ratios are relative to the first row or first column.
Odds of having most meals off-campus | Percentage having most meals off-campus | Sample size | |
---|---|---|---|
Living with parents | 0.4952 | 16.6 | 54 |
Not living with parents | 0.0833 | 3.8 | 129 |
Odds ratio | 5.943 |
An appropriate graph (Fig. 25.1) is a side-by-side bar chart or a stacked bar chart.
For comparing the odds, the side-by-side bar chart is better. (A stacked bar chart is better for comparing proportions, but either is fine.)
25.3 Sampling distribution: Comparing odds
From the numerical summary table (Table 25.2), the odds of a student eating most meals off-campus is:
- \(26\) for students living with their parents.
- \(4.375\) for students not living with their parents.
So the OR of eating most meals off-campus, comparing students living with parents to students not living with parents, is \(26 \div 4.375 = 5.943\). The odds are different in each group, and hence the OR is not one. The OR means that the odds of eating most meals off-campus for students living with their parents is 5.943 times the odds for students living not living with their parents.
Of course, every sample of students is likey to be different, so the OR varies from sample to sample, so there is sampling variation. This means that the odds ratio has a sampling distribution and a standard error.
Unfortunately, the sampling distribution of the sample OR is not a normal distribution^{419}. Fortunately, a simple transformation to the sample OR has a normal distribution. For this reason, we will use software output for finding the CI for the odds ratio, and not discuss the sampling distribution directly.
In other words, we will rely on software to find CIs for odds ratios.
25.4 Confidence intervals: Comparing odds
As noted, we rely on software to find the CI for the odds ratio, such as jamovi (Fig. 25.2) and the second table of the SPSS output (labelled, obscurely, Risk Estimate; Fig. 25.3). Both show that the sample OR is 5.94, and the (exact) 95% CI is from 1.35 to 26.1. (The SPSS output shows other information too, some of which will be useful later.)
Recall that jamovi and SPSS compute the odds ratio as either
- 'Odds are Row 1 to Row 2; odds ratio compares Column 1 odds to Column 2 odds', or
- 'Odds are Column 1 to Column 2; odds ratio compares Row 1 odds to Row 2 odds'.
The OR can be interpreted correctly either way.
We write:
Based on the sample, a 95% CI for the OR comparing the odds of eating most meals off-campus is from 1.35 to 26.1 (living with parents, compared to not living with parents).
This means there is a 95% chance that this CI straddles the population OR.
Notice that the meaning of the OR is explained in the conclusions: the odds of eating most meals off-campus, and comparing students living with parents to not living with parents.
The CI for an OR is not symmetrical, like the others we have seen^{420}.
Interpreting ORs can be confusing, so take care!
Example 25.1 (Crashes in China) A study of car crashes in a rural, mountainous county in western China^{421} recorded the data in the table below.
Type of crash | 2011 | 2015 |
---|---|---|
Involving pedestrians | 15 | 37 |
Involving vehicles | 35 | 85 |
Clearly the number of crashes is larger in 2015. However, the interest is in comparing the odds (or percentage) of crashes involving pedestrians in 2011 and 2015. (Of course, comparing the odds (or percentages) involving vehicles is also possible.)
The data can be summarised as shown below.
Year | Percentage involving pedestrians | Odds involving pedestrians | Sample size |
---|---|---|---|
In 2011 | 30.0 | 0.429 | 50 |
In 2015 | 30.3 | 0.435 | 122 |
Odds ratio: | 0.985 |
In this table, the odds are the odds that a crash involves a pedestrian.
The odds ratio is the odds of a crash involving pedestrians in 2011, compared to the odds of a crash involving pedestrians in 2015. In this situation, this is the parameter of interest.
Both the percentage and odds columns, and the odds ratio, suggest that the relative proportion of crashes involving pedestrians is very similar in 2011 and 2015.
The odds ratio is 0.986, but this value would change from sample to sample. From software, the 95% CI for the odds ratio is from 0.480 to 2.018. We would write
The population odds ratio for a crash involving pedestrians (comparing 2011 to 2015) has a 95% chance of being between 0.480 and 2.018.
25.5 Statistical validity conditions: Comparing odds
As usual, these results hold under certain conditions. The CI computed above is statistically valid if
- All expected counts are at least five.
Some books may give other (but similar) conditions.
In addition to the statistical validity condition, the CI will be
- internally valid if the study was well designed; and
- externally valid if the sample is a simple random sample and is internally valid.
The statistical validity condition is a bit tricky to understand (but is explained further in Sect. 32.3). SPSS will let you know if the expected count condition is not met, underneath the first output table in Fig. 25.3.
In jamovi, the expected counts must be explicitly requested to see if this condition is satisfied.
Example 25.2 (Statistical validity) In Fig. 25.3 (for the uni-students data), the text under the first table table of SPSS output (labelled Chi-Square Tests) says
0 cells (0.0%) have expected count less than 5.
That is, all the cells have expected counts of at least five, so the statistical validity condition is satisfied. Notice from Table 25.1 that the observed counts are not all greater than five (one cell has a count of 2). The statistical validity condition is about the expected counts though, not the observed counts.
In jamovi, the expected counts must be requested explicitly (Fig. 25.4), but again none are less than five.
In either case, the conclusion is statistically valid.
Example 25.3 (Car crashes in China) In Example 25.1, all the observed counts are larger than five.
The expected counts are shown below. Since all expected counts are larger than five, the CI will be statistically valid:
Type of crash | 2011 | 2015 |
---|---|---|
Involving pedestrians | 15.11 | 36.88 |
Involving vehicles | 34.88 | 85.12 |
These counts are what we would expected to find if there was no relationship between the type of crash in 2011 and 2015; that is, if the proportion of crashes involving pedestrians was the same in 2011 and 2015.
The observed counts are very close to these expected counts, meaning that what we observe is very close to what we expected if there was no relatiionship.
25.6 Example: Pet birds
A study examined people with lung cancer, and a matched set of controls who did not have lung cancer, and compared the proportion in each group that kept pet birds.^{422} One RQ of the study was:
What is the odds ratio of keeping a pet bird, comparing people with lung cancer (cases) compared to people without lung cancer (controls)?
The parameter is the population OR, comparing the odds of keeping a pet bird, for adults with lung cancer to adults who do not have lung cancer.
The data, compiled in a \(2\times2\) contingency table, are given in Table 25.3.
The numerical summary (Table 25.4) contains percentages, odds and the odds ratios; some of these may need to be computed manually from the data. The graphical summary (Fig. 25.5) shows a difference between the two groups in the sample.
Software computes the CI for the population odds ratio (jamovi: Fig. 25.6; SPSS: Fig. 25.7) based on the sample. The sample OR is 2.257, and the 95% CI is from 1.605 to 3.174.
We write:
Based on the sample, a 95% CI for the OR of keeping a pet bird is from 1.605 to 3.174 (comparing people with lung cancer to those without lung cancer).
That is, the plausible values for the population OR that could have produced the sample OR are between 1.605 and 3.174.
Adults with lung cancer | Adults without lung cancer | |
---|---|---|
Kept pet birds | 98 | 101 |
Did not keep pet birds | 141 | 328 |
Odds of keeping pet bird | Percentage keeping pet bird | Sample size | |
---|---|---|---|
With lung cancer: | 0.6950 | 41.0% | 238 |
Without lung cancer: | 0.3079 | 25.5% | 429 |
Odds ratio: | 2.26 |
The CI will be statistically valid if the sample is somewhat representative of some population. We see that the text under the first table of SPSS output (Fig. 25.7) indicates that the expected-counts condition is met.
25.7 Example: B12 deficiency
A study in New Zealand^{423} examined B12 deficiencies in 'predominantly overweight/obese women of South Asian origin living in Auckland', some of whom were on a vegetarian diet and some of whom were on a non-vegetarian diet. One RQ was:
What is the odds ratio of these women being B12 deficient, comparing vegetarians to non-vegetarians?
The parameter is the population OR, comparing the odds of being B12 deficient, for vegetarians to non-vegetarians.
The data appear in Table 25.5. From the jamovi output (Fig. 25.9) or SPSS output (Fig. 25.10), the OR (and 95% CI) is \(3.15\) (\(1.08\) to \(9.24\)). The numerical summary table (Table 25.6) and graphical summary (Fig. 25.8) can hence be constructed.
B12 deficient | Not B12 deficient | Total | |
---|---|---|---|
Vegetarians | 8 | 26 | 34 |
Non-vegetarians | 8 | 82 | 90 |
Total | 16 | 108 | 124 |
Odds B12 deficient | Percentage B12 deficient | Sample size | |
---|---|---|---|
Vegetarians: | 0.3077 | 23.5% | 34 |
Non-vegetarians: | 0.0976 | 8.9% | 90 |
Odds ratio: | 3.15 |
To check if these results statistically valid, notice that the text under the first table of SPSS output (Fig. 25.10) says:
1 cells (25.0%) have expected count less than 5. The minimum expected count is 4.39.
This a warning that one expected count is less than 5. Nonetheless, only one cell has an expected count less than five, and only just under 5, so we shouldn't be too concerned about statistical validity (but it should be noted).
We write:
Based on the sample, a 95% CI for the OR of being B12 deficient is from 1.08 to 9.24 (comparing vegetarians to non-vegetarians).
25.8 Quick review questions
A study^{424} of the adoption of electric vehicle (EVs) by a certain group of professional Americans (Example 5.14) compiled the data in Table 25.7. Output from using jamovi is shown in Fig. 25.11.
Yes | No | |
---|---|---|
No post-grad | 24 | 8 |
Post-grad study | 51 | 29 |
The percentage of people without post-grad study who would buy an EV in the next 10 years is :
The odds that a person without post-grad study would buy an EV in the next 10 years is:
Using the output, what is the OR of buying an electric vehicle in the next 10 years, comparing those without post-grad study to those with post-grad study?
True or false: The CI means that the sample OR is likely to be between 0.68 and 4.28.
True or false: The analysis is likely to be statistically valid?
Progress:
- The number without post-grad study: \(24 + 8 = 32\). The percentage of people without post-grad study who would buy an EV in the next 10 years is \(24/32 = 0.75\), or 75%.
- The people without post-grad study are in the top row. The odds of people without post-grad study who would buy an EV in the next 10 years is \(24/8 = 3\).
- The odds of people without post-grad study who would by an electric vehicle is \(24/8 = 3\).
The odds of people with post-grad study who would by an electric vehicle is \(51/29 = 1.7586\).
So the OR is \(3/1.7586 = 1.706\). - Not at all. We know exactly what the sample OR is (it is 1.706). CIs always give an interval in which the population parameter is likely to be within.
- The CI is statistically valid if all the expected counts exceed 5. So we don't really know for sure from the given information. But the observed counts are all reasonably large, so it is very probably statistically valid.
25.9 Exercises
Selected answers are available in Sect. D.24.
Exercise 25.1 A prospective observational study in Western Australia^{425} compared the heights of scars from burns received (Table 25.8).
jamovi was used to analyse the data (Fig. 25.12).
- Compute the odds of having a smooth scar (that is, height is 0mm) for women.
- Compute the odds of having a smooth scar (that is, height is 0mm) for men.
- Compute the odds ratio of having a smooth scar, comparing women to men.
- Interpret what this odds ratio means.
- Sketch a suitable graph to display the data.
- Construct an appropriate numerical summary table for the data.
- Write down the CI.
- Carefully interpret what this CI means.
Women | Men | |
---|---|---|
Scar height 0mm (smooth) | 99 | 216 |
Scar height more than 0mm, less than 1mm | 62 | 115 |
Exercise 25.2 A study of ear infections in Sydney swimmers^{426} recorded whether people reported an ear infection or not, and where they usually swam.
The SPSS output is shown in Fig. 25.13. Explain carefully the meaning of the OR and the corresponding CI.
Exercise 25.3 A study of turbine failures^{427} ran 73 turbines for around 1800 hours, and found that seven developed fissures (small cracks). They also ran a different set of 42 turbines for about 3000 hours, and found that nine developed fissures.
Exercise 25.4 The Southern Oscillation Index (SOI) is a standardised measure of the air pressure difference between Tahiti and Darwin, and is related to rainfall in some parts of the world,^{428} and especially Queensland.^{429}
The rainfall at Emerald (Queensland) was recorded for Augusts between 1889 to 2002 inclusive,^{430} where the monthly average SOI was positive, and when the SOI was non-positive (that is, zero or negative), as shown in Table 25.9.
Using the jamovi output in Fig. 25.16:
- Find a 95% CI for the OR.
- Carefuly explain what this OR means.
Non-positive SOI | Positive SOI | |
---|---|---|
No rainfall recorded | 14 | 7 |
Rainfall recorded | 40 | 53 |
Exercise 25.5 A research study conducted in Brisbane^{431} recorded the number of people at the foot of the Goodwill Bridge, Southbank, who wore sunglasses and hats between 11:30am to 12:30pm. Table 25.10 records the number of females and males wearing hats.
Using the SPSS output in Fig. 25.17, find a 95% CI for the OR, and carefully explain what OR this CI applies to. Also, construct the numerical summary table.
No hat | Hat | |
---|---|---|
Male | 307 | 79 |
Female | 344 | 22 |