10.4 Lazer et al (2014) The Parable of Google Flu: Traps in Big Data Analysis

  • Lazer, David, Ryan Kennedy, Gary King, and Alessandro Vespignani. 2014. “Big Data. The Parable of Google Flu: Traps in Big Data Analysis.” Science 343 (6176): 1203–5.

  • Q: Whats the topic of this study?

  • What is the research question?
    • General: What can cause errors in prediction?
    • Specific: Given that GFT is often held up as an exemplary use of big data what lessons can we draw from this error?
  • What is the hypothesis?
    • Don’t really formulate one but probably had a hunch that it relates to measurement and algorithm
  • What data do they use?
    • Data from Google Flu and Google Correlate
  • What is their finding?
    • Prediction error was causes by big data hubris and algorithm dynamics
      • “‘Big data hubris’ is the often implicit assumption that big data are a substitute for, rather than a supplement to, traditional data collection and analysis.” (???)
        • “core challenge is that most big data that have received popular attention are not the output of instruments designed to produce valid and reliable data amenable for scientific analysis.” (???)
          • In short, the initial version of GFT was part flu detector, part winter detector (search terms!).
      • Algorithm Dynamics: Changes to the search algorithm; Search behavior is endogenously cultivated by the platform; Read team attack (users trying to manipulate results -> See Twitter Trending Topics)
  • Transparency and Replicability: GFT can not be replicated and is not transparent
    • No accumulation of knowledge possible
    • Researchers waiting for fuel in the form of data
  • Using Big Data to understand the unknown
    • The fine-grained geographical data would be most helpful
  • Study the algorithm
    • Studying the evolution of socio-technical systems embedded in our societies is intrinsically important and worthy of study. The algorithms underlying Google, Twitter, and Facebook help determine what we find out about our health, politics, and friends.
  • It’s Not Just About Size of the Data.
    • Instead of focusing on a “big data revolution,” perhaps it is time we were focused on an “all data revolution,” where we recog- nize that the critical change in the world has been innovative analytics, using data from all traditional and new sources, and provid- ing a deeper, clearer understanding of our world.