Preface
On doing good things with data
This essay presents views that I have developed throughout my career, especially during my 8-year tenure as the Associate Director for Data Science in CDC’s Center for Surveillance, Epidemiology, and Laboratory Services from January 2015 through CSELS’s dissolution in January 2023. I wrote this essay to bring together as a coherent whole several related ideas on how CDC should think about, talk about, and support a work culture oriented to doing good things with data. In my view, this is CDC’s single greatest area for gains from doing good things with data: connecting technical excellence and analytic rigor to doing science better, getting better at learning things about the world, and getting better at doing things with what has been learned. In my experience, CDC has tended to overemphasize technology and underemphasize critical reflection and practical wisdom in doing things with data.
Following an introduction, the next 4 sections expand on why we should care about doing good things with data, what data science is and is not, how to construct and support a culture for doing good things with data, and who plays various roles and carries out the functions of a culture for doing good things with data. In the penultimate section, I add my personal history with data science. Then I extend my discussion on machine learning and artificial intelligence as a salient, contemporary set of issues for doing good things with data. Finally, I cap this essay with an aspirational manifesto for creating and fostering a progressive culture for data in public health.
This essay is a snapshot in time. It reaches back to extensive reading and study that I undertook from 2014 through 2016, but it stops around 2021. The field continues to change rapidly. But as much as data science is about keeping up with fast-moving methods, tools, and technology, the schema for data science is itself stable. So, for example, where I describe how CDC should think about machine learning and artificial intelligence, I don’t specifically address recent large language (“chat”) models.
I dedicate this essay to the dozens of folks whom I have mentored since I joined CDC as a federal employee in February 2000. I believe in you. You’re the reason that I wholeheartedly believe that a progressive culture for data centers on learners—because learners believe.
Addendum 2025: As a snapshot in time, this essay precedes not only the rapid ascent of large language models (“AI”), but also substantive changes to CDC’s organizational structure. First came “Moving Forward” as a reaction to and purported correction because of experiences in the Covid response. Then came the massive organizational changes ushered in by the 47th presidential administration in early 2025. I have opted to preserve the verb tenses and to ask you, the reader, to view this snapshot as aspirational more than prescriptive. When we are able to restore public service and recognize and esteem both learning from data and data-savvy public servants, this essay and the manifesto that goes with it will continue to apply in principle even if some details shift or evolve.