12.10 Case Study: The NHANES data

To demonstrate how graphs can help answer RQs, consider the following RQ:

Among Americans, is the average direct HDL cholesterol different for those who current smokers and non-smokers?

From the RQ, the Population is ‘Americans,’ the Outcome is the ‘average direct HDL cholesterol levels,’ and the Comparison is ‘Between those who currently do and do not smoke.’ There is no intervention; this is a relational RQ, that can be answered using an observational study.

Think 12.5 (Confounding variables) As with any study, managing confounding should be considered, by thinking about possible extraneous variables that could be measured or observed.

Can you think of any possible extraneous variables that have the potential to be confounding variables?

In this study, clearly we cannot collect primary data. However, data to answer this (and many other) RQs are obtained from the American National Health and Nutrition Examination Survey (NHANES) (Center for Disease Control and Prevention (CDC) 1988--1994; Center for Disease Control and Prevention 1996; Pruim 2015). From the NHANES webpage, this survey:

… examines a nationally representative sample of about 5,000 persons each year… located in counties across the country, 15 of which are visited each year.

— NHANES webpage (emphasis added)

The data available are equivalent to a “a simple random sample from the American population” (Pruim 2015). In total, 10,000 observations on scores of variables are available (from the 2009/2010 and the 2011/2012 surveys). Fig. 12.35 shows the first 5000 observations on the first 40 variables only.