14 Relationships: qualitative data comparisons between individuals
So far, you have learnt to ask a RQ, design a study, collect the data, and describe the data. In this chapter, you will learn to:
- compare qualitative data between individuals using the appropriate graphs.
- compare qualitative data between individuals using odds ratios and summary tables.
14.1 Introduction
Relational RQs compare groups. This chapter considers how to compare qualitative variables in different groups. Tables and graphs are very useful this purpose.
14.2 Two-way tables
When more than one qualitative variable is recorded for each individual, the data can be collated into table. When two qualitative variables are cross-tabulated, this is called a two-way table. As always, the categories should be exhaustive (cover all values) and exclusive (observations belong to one and only one category).
Example 14.1 (Two-way tables) A medical study (Charig et al. 1986) compared two treatments for kidney stones to determine which had a high success rate. Data were collected from \(700\) UK patients, on two qualitative variables:
- the treatment method ('A' or 'B'): The explanatory variable.
- the result ('success' or 'failure' of the procedure): The response variable.
Both variables are qualitative with two levels, and each treatment was used on \(350\) patients. Treatment A was used from 1972--1980, and Treatment B from 1980--1985; that is, treatments were not randomly allocated, and so confounding may be present. For this reason, the researchers also recorded the size of the kidney stone ('small' or 'large') as one possible confounding variable. Firstly, consider just the small stones (Julious and Mullee 1994), displayed in the two-way table in Table 14.1.
Success | Failure | Total | |
---|---|---|---|
Method A | \(\phantom{0}81\) | \(\phantom{0}6\) | \(\phantom{0}87\) |
Method B | \(234\) | \(36\) | \(270\) |
Total | \(315\) | \(42\) | \(357\) |
14.3 Summary tables by rows and columns
Each variable in a two-way table can be analysed separately, using percentage, proportions or odds (Sect. 13.4). For example, the two variables in Table 14.1 (Method; Result) can be analysed separately, using percentages, proportions or odds (Sect. 13.4). For instance:
- the percentage of procedures that were successful is \(315/357\times 100 = 88.2\)%.
- the odds that a procedure was successful is \(315/42 = 7.5\); that is, there were \(7.5\) times as many successful procedures as unsuccessful procedures.
However, to compare Methods A and B, these odds and percentages can be computed for each row (or column) separately.
Example 14.2 (Large kidney stones) The data in Table 14.1 can be numerically summarised by computing proportions or percentages by row. The rows refer to the different Methods, so this will compare the two methods.
For the small kidney stones (Table 14.1), the row percentages (Table 14.2 give the proportion of successes for each Method, since the rows represent the counts for Methods A and B. Row proportions allow the proportions within the rows (i.e., for each Method) to be compared:
- Method A: \(81 \div 87 = 0.931\) (or \(93.1\)%) of operations in the sample were successful; and
- Method B: \(234\div 270 = 0.867\) (or \(86.7\)%) of operations in the sample were successful.
This suggests that, for small kidney stones, Method A is slightly more successful (\(93.1\)%) than Method B (\(86.7\)%) in the sample. These percentages are collated in Table 14.2.
Odds can also be computed:
- Method A: The odds of success is \(81/6 = 13.5\). This means there are \(13.5\) more successful procedures than failures for Method A.
- Method B: The odds of success is \(234/36 = 6.5\). This means there are \(6.5\) more successful procedures than failures for Method B.
This shows that the odds of a success is far greater for Method A than Method B.
Success | Failure | Total | |
---|---|---|---|
Method A | \(93.1\) | \(6.9\) | \(100\) |
Method B | \(86.7\) | \(13.3\) | \(100\) |
Success | Failure | |
---|---|---|
Method A | \(25.7\) | \(14.3\) |
Method B | \(74.3\) | \(85.7\) |
Total | \(100.0\) | \(100.0\) |
Rather than comparing methods (in the rows), the procedure results can be compared (i.e., the columns).
Example 14.3 (Comparing by column) For the small kidney stones (Table 14.1), the column proportions (Table 14.3 give the proportion of successes within each column (i.e., for successes and for failures), since the columns contain the procedure results. Column proportions allow the proportions (or percentages) within columns to be compared:
- Successful operations: \(81 \div 315 = 0.257\) (or \(25.7\)%) in the sample were with Method A; and
- Unsuccessful operations: \(234\div 315 = 0.143\) (or \(14.3\)%) in the sample were with Method A.
Row percentages seems more intuitive than column percentages here: they compare the success percentage for each treatment method.
Odds can also be computed:
- Successes: The odds of a success coming from Method A is \(81/234 = 0.346\). This means there are \(0.346\) more Method A procedures than Method B procedures among the successes.
- Failures: The odds of failure coming from Method A is \(6/36 = 0.167\). This means there are \(0.167\) more Method A procedures than Method B procedures among the failures.
This shows that the odds of a success being a Method A procedure is quite different than the odds of a success being a Method B procedure.
14.4 Graphs
When a qualitative variable is compared across different groups (i.e., comparing between individuals), options for plotting include:
- Stacked bar charts (Sect. 14.4.1);
- Side-by-side bar charts (Sect. 14.4.2); or
- Dot charts (Sect. 14.4.3).
14.4.1 Stacked bar charts
The data can be graphed by using a bar for each level of one variable, and stacking the bars the levels of the second variable.
Example 14.4 (Stacked bar charts) For the kidney-stone data in Example 14.1, a stacked bar chart can be created by producing a bar for each method, and stacking the successes and failures for each method (Fig. 14.1, top left panel).
Rather than using numbers, the percentages within each group can be used too (Fig. 14.1, top right panel).
14.4.2 Side-by-side bar charts
Instead of stacking the success and failures bars on top of each other, these bars can be placed side-by-side for each method.
Example 14.5 (Side-by-side bar charts) For the kidney-stone data in Example 14.1, a side-by-side bar chart can be created by producing two bars for each method (one for failures; one for successes), and placing these side-by-side (Fig. 14.1, bottom left panel). Again, numbers or percentages can be used.
14.4.3 Dot charts
Instead of bars, dots (or other symbols) can be used in place of the bars in a side-by-side bar chart.
14.4.4 Other variations
Many variations of these charts are possible, by making certain choices:
- use a stacked bar chart, side-by-side bar chart, or dot chart.
- use percentages or counts.
- use the counts (or percentage) on either the horizontal or vertical axis.
- decide which variable can be used as the first division of the data.
The guiding principle remains: the purpose of a graph is to display information in the clearest, simplest possible way, to help the reader understand the message(s) in the data.
Using a computer to create graphs is recommended, and using a computer makes it easy to try different variations to find the graph that best displays the message in the data.
14.5 Comparing odds: odds ratios
To summarise the small kidney stone data (Table 14.1) using odds:
- Method A: the odds of success are \(13.5\) (\(13.5\) times as many successes as failures).
- Method B: the odds of success are \(6.5\) (\(6.5\) times as many successes as failures).
The odds of success for Method A and Method B are very different. In the sample, the odds of success for Method A is many times greater than for Method B. In fact, in the sample, the odds of success for Method A is \(13.5\div 6.5 = 2.08\) times the odds of a success for Method B. This value is the odds ratio (OR). The sample odds ratio is a statistic, and the (unknown) population odds ratio is a parameter.
Definition 14.1 (Odds Ratio (OR)) The odds ratio is the ratio of the odds of an event in one group, compared to the odds of the same event in a different group:
\[
\text{Odds ratio} =
\frac{\text{Odds of an event in Group A}}
{\text{Odds of the same event in Group B}}.
\]
Our computation of the odds ratio is the same used by jamovi, SPSS and many other statistical programs. The odds ratio can be interpreted in either of these ways (i.e., both are correct):
- The odds compare Row 1 counts to Row 2 counts, for both columns.
The odds ratio then compares the Column 1 odds to the Column 2 odds. - The odds compare Column 1 counts to Column 2 counts.
The odds ratio then compares the Row 1 odds to the Row 2 odds.
Odds and odds ratios are computed with the first row and first column values on the top of the fraction.
Example 14.7 (Interpreting odds) For the small kidney stone data, the odds of a success for Method A is \(81\div6 = 13.5\) (i.e., successes). The value of \(81\) is from Row 1 of the table. Similarly, the odds of a success for Method B is \(234\div36 = 6.5\), where the value of \(234\) is from Row 1 of the table (i.e., successes).
The odds ratio is then computed as \(13.5\div 6.5\), where the odds of \(13.5\) is from Row 1 of the table (i.e., Method A).
The OR compares the odds of the same event (e.g., success) in two different groups (e.g., Method A and Method B). This means that a \(2\times 2\) table can be summarised using one number: the odds ratio (OR).
Take care interpreting odds ratios (or ORs):
- odds ratio is greater than \(1\): the odds of the event is greater for the group in the top of the division compared to the group in the bottom of the division.
- odds ratio is equal to \(1\): the odds of the event is the same for both groups (in the top and the bottom of the division).
- odds ratio is less than \(1\): the odds of the event is less for the group in the top of the division compared to the group in the bottom of the division.
The following short video may help explain some of these concepts:
14.6 Numerical summary tables
The numerical summary information for comparing qualitative variables can be collated in a table. The data should be summarised by one of the qualitative variables, producing percentages and odds for the other.
Example 14.8 (Numerical summary table) For the small kidney-stone data, the summary of the data can be tabulated as in Table 14.4.
Based on the table, do you think a difference exists between the success rate for each method? Remember, the data in Table 14.4 are for a sample, but the RQ is asking about the population.
Percentage | Odds | Sample size | |
---|---|---|---|
Method A | \(93.1\) | \(13.500\) | \(\phantom{0}87\) |
Method B | \(86.7\) | \(\phantom{0}6.500\) | \(270\) |
Odds ratio | \(\phantom{0}2.077\) |
14.7 Example: large kidney stones
The data in Table 14.1 are for small kidney stones. Data were also recorded for the large kidney stones (Table 14.5). As for small kidney stones, the success percentages can be computed for large kidney stones for Methods A and B (i.e., row percentages):
- Method A: Success proportion for large kidney stones: \(192/263 = 0.730\), or \(73.0\)%; and
- Method B: Success proportion for large kidney stones: \(55/80 = 0688\), or \(68.8\)%.
For large kidney stones, then, Method A has a higher success proportion than Method B, just as with the small kidney stones.
Success | Failure | Total | |
---|---|---|---|
Method A | \(192\) | \(71\) | \(263\) |
Method B | \(\phantom{0}55\) | \(25\) | \(\phantom{0}80\) |
So... could the data for small (Table 14.1) and large kidney stones (Table 14.5) be combined, to produce a single two-way table of just Method and Result (Table 14.6), ignoring size?
Success | Failure | Total | |
---|---|---|---|
Method A | \(273\) | \(77\) | \(350\) |
Method B | \(289\) | \(61\) | \(350\) |
Compute the success proportions for Method A and Method B when small and large stones are combined (Table 14.6):
- For all stones combined, what is the success proportion for Method A?
- For all stones combined, what is the success proportion for Method B?
Which method has the higher success proportion for all stones combined?
Method A has a higher success proportion (\(273/350 = 0.780\)) than Method B (\(289/350 = 0.826\)), for all kidney stones combined.
To summarise:
- Method A is better for small stones (\(93.1\)% vs \(86.7\)%);
- Method A is better for large stones (\(73.0\)% vs \(68.8\)%); but
- Method B is better when all stones are combined (\(78.0\)% vs \(82.6\)%)...
That seems strange: Method A performs better for small and large kidney stones, but Method B performs better when size is unknown (i.e., ignoring size).
The size of the stone is a confounding variable (Fig. 14.2), as it is associated with the method (small stones are treated more often with Method B) and with the result (small stones have a greater success proportion for both methods).
This confounding could have been avoided by randomly allocating a treatment method to patients. However, random allocation was not possible in this study, so the researchers used a different method to manage confounding: recording the size of the kidney stones; see Sect. 8.2.
In this example, incorporating information about a potential confounder (the size of the kidney stone) is important, otherwise the wrong (opposite) conclusion is reached: Method B would be considered better if the size of the stones was ignored, when the better method really is Method A.
This is called Simpson's paradox. If the size of the kidney stone had not been recorded, size would be a lurking variable, and the incorrect conclusion would have been reached.
14.8 Example: water access
A study of three rural communities in Cameroon (López-Serrano et al. 2022) recorded data about access to water (see Sects. 12.10 and 13.6). One of the main purposes of the study was to determine contributors to the incidence of diarrhea in young children (\(85\) households had children under \(5\)). A cross-tabulation (Table 14.7) shows the relationship with keeping livestock; the numerical summary table (Table 14.8) may suggest a difference due to keeping livestock. The comparison in Fig. 14.3 includes some categories with small sample sizes, so the percentages shown may not be precise estimates of the population values.
As usual, the data come from one of countless possible samples, but the RQ is about the population, so making a definitive decisions is difficult.
No diarrhea | Diarrhea | |
---|---|---|
Does not have livestock | \(17\) | \(\phantom{0}3\) |
Has livestock | \(42\) | \(23\) |
Percentage | Odds | Sample size | |
---|---|---|---|
Household does not have livestock | \(15.0\) | \(0.176\) | \(20\) |
Household has livestock | \(35.4\) | \(0.548\) | \(65\) |
Odds ratio | \(0.322\) |
14.9 Chapter summary
Qualitative data can be compared between different groups (between individuals comparisons) using a dot chart, bar chart or pie chart. The data can be displayed in a two-way table, then summarised numerically by comparing proportions, percentages and odds. Odds ratio can be used to compare odds in two different groups.
14.10 Quick revision questions
A study (Alley et al. 2017) examined social media use, using a representative sample of Queenslanders at least \(18\) years of age (from the \(2013\) Queensland Social Survey; Table 14.9).
- Compute the sample proportion of urban residents who use social media.
- Compute the sample proportion of rural residents who use social media.
- Compute the sample odds of urban residents who use social media.
- Compute the sample odds of rural residents who use social media.
- Compute the sample odds ratio of using social media, comparing urban to rural residents.
Doesn't use SM | Uses SM | Total | |
---|---|---|---|
Rural residents | \(\phantom{0}78\) | \(\phantom{0}89\) | \(167\) |
Urban residents | \(416\) | \(568\) | \(984\) |
14.11 Exercises
Selected answers are available in App. E.
Exercise 14.1 A study of hangovers (Köchling et al. 2019) recorded, among other information, when people vomited after consuming alcohol. Table 14.10 shows how many people vomited after consuming beer followed by wine, and how many people vomited after consuming only wine.
- Compute the row proportions. What do these mean?
- Compute the column percentages. What do these mean?
- Compute the overall percentage of drinkers who vomited.
- Compute the odds that a wine-only drinker vomited.
- Compute the odds that a beer-then-wine drinker vomited.
- Compute the odds ratio, comparing the odds of vomiting for wine-only drinkers to beer-then-wine drinkers.
- Compute the odds ratio, comparing the odds of vomiting for beer-then-wine drinkers to wine-only drinkers.
Beer then wine | Wine only | |
---|---|---|
Vomited | \(\phantom{0}6\) | \(\phantom{0}6\) |
Didn't vomit | \(62\) | \(22\) |
Exercise 14.2 [Dataset: EmeraldAug
]
In a study of wallabies at the East Point Reserve, Darwin, the sex of adult and young wallabies was recorded (Stirrat 2008).
In December 1993, \(91\) males and \(188\) female adult wallabies were recorded, and \(13\) male and \(22\) female young wallabies were recorded.
- Create the two-way table of counts.
- For adult wallabies, what proportion of adult wallabies were males?
- For adult wallabies, what are the odds that a female was observed?
- For young wallabies, what percentage of wallabies were males?
- For young wallabies, what are the odds that a female was observed?
- What is the odds ratio of observing an adult wallaby, comparing females to males?
Exercise 14.3 [Dataset: EmeraldAug
]
The Southern Oscillation Index (SOI) is a standardised measure of the air pressure difference between Tahiti and Darwin, shown to be related to rainfall in some parts of the world (Stone, Hammer, and Marcussen 1996), and especially Queensland, Australia (Stone and Auliciems 1992; P. K. Dunn 2001).
The rainfall at Emerald (Queensland) was recorded for Augusts between 1889 to 2002 inclusive (P. K. Dunn and Smyth 2018), for months when the monthly average SOI was positive and non-positive (zero or negative); see Table 14.11.
- Compute the percentage of Augusts with no rainfall.
- Compute the percentage of Augusts with no rainfall, in Augusts with a non-positive SOI.
- Compute the percentage of Augusts with no rainfall, in Augusts with a positive SOI.
- Compute the odds of no August rainfall.
- Compute the odds of no August rainfall, in Augusts with a non-positive SOI.
- Compute the odds of no August rainfall, in Augusts with a positive SOI.
- Compute the odds ratio of no August rainfall, comparing Augusts with non-positive SOI to Augusts with a positive SOI.
- Interpret this OR.
Non-positive SOI | Positive SOI | |
---|---|---|
No rainfall recorded | \(14\) | \(\phantom{0}7\) |
Rainfall recorded | \(40\) | \(53\) |
Exercise 14.4 A study (Haselgrove et al. 2008) asked boys and girls in Western Australia about back pain from carrying school bags (Table 14.12).
- Compute the percentage of boys reporting back pain from carrying school bags.
- Compute the percentage of girls reporting back pain from carrying school bags.
- Compute the odds of boys reporting back pain from carrying school bags.
- Compute the odds of girls reporting back pain from carrying school bags.
- Compute the odds of a child reporting back pain.
- Compute the odds ratio of reporting back pain, comparing boys to girls.
- Interpret this OR.
Males | Females | |
---|---|---|
No back pain | \(330\) | \(226\) |
Back pain | \(280\) | \(359\) |
Exercise 14.5 Using the information in Table 13.2, create a stacked bar chart to display the responses to the three questions.
Exercise 14.6 A study of road kill (T. C. Russell, Herbert, and Kohen 2009) produced the data in Table 14.13.
- Identify the two variables, and classify them as nominal or ordinal.
- Sketch some graphs to display the data.
- What is the main message in the data? What graph shows this best?
Unknown | M | F | |
---|---|---|---|
Autumn | \(75\) | \(25\) | \(21\) |
Winter | \(74\) | \(27\) | \(22\) |
Spring | \(71\) | \(10\) | \(18\) |
Summer | \(58\) | \(10\) | \(12\) |
Exercise 14.7 The data in Table 14.14 come from a study of Iranian children aged \(6\)--\(18\) years old (Kelishadi et al. 2017).
- Compute the proportion of females who skipped breakfast.
- Compute the proportion of males who skipped breakfast.
- Compute the odds of a female skipping breakfast.
- Compute the odds of a male skipping breakfast.
- Compute the odds ratio comparing the odds of skipping breakfast, comparing females to males.
- Interpret this OR.
- Construct a summary table.
Skips breakfast | Doesn't skip breakfast | Total | |
---|---|---|---|
Females | \(2383\) | \(4257\) | \(6640\) |
Males | \(1944\) | \(4902\) | \(6846\) |