May 21, 2022

Preprint: Using Social Media To Predict the Future: A Systematic Literature Review

The following article/literature review was recently shared on arXiv.


Using Social Media To Predict the Future: A Systematic Literature Review


Lawrence Phillips
Pacific Northwest National Laboratory

Chase Dowling
Pacific Northwest National Laboratory
University of Washington

Kyle Shaffer
Pacific Northwest National Laboratory

Nathan Hodas
Pacific Northwest National Laboratory

Svitlana Volkov
Pacific Northwest National Laboratory


via arXiv


Social media (SM) data provides a vast record of humanity’s everyday thoughts, feelings, and actions at a resolution previously unimaginable. Because user behavior on SM is a reflection of events in the real world, researchers have realized they can use SM in order to forecast, making predictions about the future. The advantage of SM data is its relative ease of acquisition, large quantity, and ability to capture socially relevant information, which may be difficult to gather from other data sources. Promising results exist across a wide variety of domains, but one will find little consensus regarding best practices in either methodology or evaluation. In this systematic review, we examine relevant literature over the past decade, tabulate mixed results across a number of scientific disciplines, and identify common pitfalls and best practices. We find that SM forecasting is limited by data biases, noisy data, lack of generalizable results, a lack of domain-specific theory, and underlying complexity in many prediction tasks. But despite these shortcomings, recurring findings and promising results continue to galvanize researchers and demand continued investigation. Based on the existing literature, we identify research practices which lead to success, citing specific examples in each case and making recommendations for best practices. These recommendations will help researchers take advantage of the exciting possibilities offered by SM platforms.

228 references.

Direct to Full Text (55 pages; PDF)

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.