Chapter 9 Thinking Conditionally
There are many examples where the choice of grouping matters when looking at data sets. In particular, Simpson’s Paradox is a situation where the result changes based on what scale we examine the data.
9.1 Simple Examples
9.1.1 Gender Bias in Admission Rates.
In 1975, researchers looked into alegations of gender bias in graduate program admissions at University of California, Berkeley. Data from 1973 admissions showed a statistically signficant difference between acceptance rates between males and females.
Gender | Applicants | Percent Admitted |
---|---|---|
Males | 8442 | 44% |
Females | 4321 | 35% |
This appears to be pretty convincing evidence that females are being discriminated against. However… when we look closer at the data we should look at the department level admission rates.
Gender | Department | Applicants | Percent Admitted | Which is Higher? |
---|---|---|---|---|
Males | A | 825 | 62% | |
Females | A | 108 | 82% | F |
Males | B | 560 | 63% | |
Females | B | 25 | 68% | F |
Males | C | 325 | 37% | M |
Females | C | 593 | 34% | |
Males | D | 417 | 33% | |
Females | D | 375 | 35% | F |
Males | E | 191 | 28% | M |
Females | E | 393 | 24% | |
Males | F | 373 | 6% | |
Females | F | 341 | 7% | F |
We first notice that males and females tended to apply to different programs. For example 88% of department A applicants were male. Because females tended to apply to departments with lower acceptance rates, they had a lower university wide acceptance rate. However, in most departments (6/8) they actually were accepted at a higher rate than males.
9.1.2 Kidney Stone Treatment
Suppose that we have two different treatments to address Kidney Stones.
Treatment | Patients | Success Percent |
---|---|---|
A | 273/350 | 78% |
B | 289/350 | 83% |
So it appears that treatment B is a superior treatment. But behind the scenes we have the issue that kidney stones come in different sizes and treatment A is commonly used to treat small stones, and treatment B is commonly used to treat large stones.
Treatment | Size | Patients | Success Percent |
---|---|---|---|
A | Small | 81/87 | 93% |
B | Small | 234/270 | 87% |
A | Large | 192/263 | 73% |
B | Large | 55/80 | 69% |
So treatment A is is the superior treatment in both the small and large kidney stone cases. But because treatment A is actually more expensive, it is more commonly used in the large stone case.
Without considering groups, there is a negative relationship between the cause/response, but when we consider the groups, there is a positive relationship.
9.1.3 Build your own
Simpson’s paradox appears in a variety of situations. Come up with a scenario where the statistic changes interpretation based on the scale.
- Suppose that we look at patient fatalities between two surgeons. One surgeon is pretty average, while the other gets assigned the most dangerous and sever cases.
- Consider two baseball players where one players batting percentage is higher each year, but overall is lower.
9.1.4 In Practice
When doing exploratory data analysis, we often find ourselves examining the relationship between two variables and seeing something interesting. The follow-up questions to ask yourself are:
- Does the relationship change for different genders?
- Does the relationship change for different socio/economic groups?
- Does the relationship change across time?
- Does the relationship change across space?
- Does the relationship change across some grouping structure?
9.1.5 Corona Virus Example
In Spring of 2020, the corona virus is causing considerable worldwide problems and is widely spread. We are interested in how this disease has spread across the globe and could potentially affect us (at Northern Arizona University) as of March 10th.
Fortunately Johns Hopkins University has done the hard work of compiling case information from a variety of world and national resources and provided the aggregated data via a publicly accessible GitHub site. All the following graphs and discussion are based on their data and subject to Johns Hopkins license that their work shouldn’t be used commercially.