Chapter 2 Reading The News
2.1 Statistics In The Media
Chapter 2 of our textbook is about “reading the news”. Our textbook identifies seven critical components to look for when looking at the reports of statistical studies, which are unfortunately often missing, misinterpreted, or deliberately distorted.
The source of the research and its funding.
The researchers that had contact with the participants.
The individuals or objects studied and how they were selected.
The nature of the measurements that were made or the questions that were asked.
The setting in which the measurements were taken.
Differences in the groups that were compared and the factor(s) of interest.
The extent or size of any claimed effect or difference.
Let’s try to identify these seven components for the Physicians Health Study that we introduced last week. Often we aren’t going to know much about some of these components.
http://phs.bwh.harvard.edu/phs1.htm
source – the study was conducted by researchers at Harvard, with funding from with funding from the National Cancer Institute and the National Heart, Lung, and Blood Institute
researchers – Dr. Charles Hennekens led a team of medical researchers and professionals from two different hospitals associated with Harvard
individuals – the study recruited over 22,000 male physicians between the ages of 40 and 84 (we’ve discussed the pros and cons of this method of sampling)
the major questions were (1) does taking a low dose of aspirin help reduce the chance of having a myocardial infarction and (2) does taking a beta-carotene supplement help reduce the chance of having cancer.
The individuals in the study were part of a \(2 \times 2\) factorial design and were randomly assigned to aspirin or placebo and to beta-carotene or placebo. The study was a double-blind study, so neither the individuals nor researchers knew who was taking the treatment versus the placebo. The physicians were followed over several years, with health outcomes (such as having medical issues such as a heart attack or cancer) measured.
A statistically significant difference was noticed between the aspirin group and placebo group, with the aspirin takers having significantly fewer heart attacks. The difference in cancer between the beta-carotene group and placebo group was quite small and was not significant.
There are various statistical measures to describe the diffence in heart attacks.
Condition | Heart Attack | No Heart Attack | Attacks per 1000 |
---|---|---|---|
Aspirin | 104 | 10,933 | 9.42 |
Placebo | 189 | 10,845 | 17.13 |
The rate of heart attacks in the aspirin group was \[\frac{9.42}{17.13}=0.55\] only 55% what was observed in the placebo group.
A medical journal would include more sophisticated statistics like the risk ratio, odds ratio, confidence intervals with margins of error for these statistics, or the results of a statistical procedure such as the chi-square test. See vassarstats.net/odds2x2.html for an online calculator of these, if you like. Why might it be unlikely to include this information in an article meant for the general public?
This difference would have occured by chance only about 1 in 100,000 times (this is a statistic called the \(p\)-value which we cover later.)
2.2 Bad News Article
For a particularly blatant example of a bad news article, let’s look at hypothetical example #4, “Survey Finds Most Women Unhappy In Their Choice of Husbands”.
Here’s an actual article from Psychology Today on this topic. Unfortunately, the link to the survey done by the National Opinion Research Center (a reputable pollster associated with the U. of Chicago) is dead, making it hard for me to judge the quality of the data collection.
2.3 How To Defend Yourself Against Misleading Statistics In The News
Here’s a link to videos by David Spiegelhalter, a well-known British statistician. He is at the University of Cambridge, spent most of his career working on technical research dealing with clinical trials, Bayesian statistics, and statistical software. He has devoted the latter part of his career to increasing public understanding of risk and statistical issues.
https://www.youtube.com/watch?v=4f9G_8TK6Mo (June 2015)
https://www.youtube.com/watch?v=m_D9egJHfCw (July 2020)
https://www.youtube.com/watch?v=NHFWuRYcUp4 (March 2021)
The third video is from The Economist, with information from their data journalists on how they report numerical information to their readers, which also reaches politicians and policy makers.
We’ll watch the second or third video in class class (at least part of it) if we have enough time. If not, watch on your own.
From the first video (pre-COVID)
The “chocolate hoax” (most but not all of the media in the UK did not fall for this intentionally poorly done study)
Health benefits to alcohol, or not? (no graphs in paper, why?)
Genetics of high blood pressure.
“The power of the press release”.
From the second video (post-COVID)
“The numbers have no way of speaking for themselves. We speak for them. We imbue them with meaning.” (Spiegelhalter is quoting Nate Silver)
COVID death statistics in England & Wales.
Risk by age.
(what about non-lethal outcomes?)
From the third video (post-COVID)
Impact on politicians, not just readers.
Web-scraping information from Google Maps on the “busy-ness” of locations in major cities (such as Termini, which is the train station in the center or Rome) in determining how much people’s behavior had changed in terms of a “lockdown” due to COVID.
The “flatten the curve” diagram that was well-known at the beginning of the pandemic. The data journalist at The Economist modified a chart from the CDC. An academic then modified her curve by adding a line for “health care system capacity” (which is still relevant in 2021).
2.4 A Good Looking Graph
I found this graph last year on Twitter, dealing with how results of a recent poll of the candidates for the Democratic Party’s presidential nominee were reported by one of the the candidates.
https://twitter.com/daveweigel/status/1166328610912985088?s=20
The actual poll was 800 adults and the results above were from a subsample of 298 that self-identified as registered voters who identify with the Democratic Party, so the margin of error is quite large and in my professional opinion a more appropriate headline would be “Three-Way Tie in Crowded Field, Based On One Poll With A Pretty Small Sample”, but maybe that’s not as catchy.
https://www.monmouth.edu/polling-institute/reports/monmouthpoll_US_082619/