January 27, 2022

Big Data: “What We Can Learn From the Epic Failure of Google Flu Trends”

From a Wired.com Article by David Lazer and Ryan Kennedy:

Every day, millions of people use Google to dig up information that drives their daily lives, from how long their commute will be to how to treat their child’s illness. This search data reveals a lot about the searchers: their wants, their needs, their concerns—extraordinarily valuable information. If these searches accurately reflect what is happening in people’s lives, analysts could use this information to track diseases, predict sales of new products, or even anticipate the results of elections.


In 2008, researchers from Google explored this potential, claiming that they could “nowcast” the flu based on people’s searches. The essential idea, published in a paper in Nature, was that when people are sick with the flu, many search for flu-related information on Google, providing almost instant signals of overall flu prevalence


In a paper published in 2014 in Science, our research teams documented and deconstructed the failure of Google to predict flu prevalence. Our team from Northeastern University, the University of Houston, and Harvard University compared the performance of GFT with very simple models based on the CDC’s data, finding that GFT had begun to perform worse. Moreover, we highlighted a persistent pattern of GFT performing well for two to three years and then failing significantly and requiring substantial revision.

The point of our paper was not to bury big data—our own research has demonstrated the value of big data in modeling disease spread, real time identification of emergencies, and identifying macro economic changes ahead of traditional methods. But while Google’s efforts in projecting the flu were well meaning, they were remarkably opaque in terms of method and data—making it dangerous to rely on Google Flu Trends for any decision-making.

Read the Complete Article (1033 Words)

See Also: “Google Flu Trends Website Shuts Down; Will Send Data to Boston Children’s, Columbia, CDC” (August 24, 2015)

About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.