Research Article: “Using Publicly Visible Social Media to Build Detailed Forecasts of Civil Unrest”
The following article was published in Security Informatics, a SprigerOpen journal.
Note: The article featured in this post includes a discussion about use of a technique to estimate “non-GPS-known user locations.”
Ryan Compton, lead author of this paper is also the lead author of a paper that is focused on the technique he and his colleagues developed to find location of posts by non-gps-known users of Twitter. The reesearch was presented at a conference and also published online last year (2014). In the past week an updated version of the research paper was posted online. We’ve posted a direct link to the updated paper at the very bottom of this post.
Using Publicly Visible Social Media to Build Detailed Forecasts of Civil Unrest
Lalindra De Silva
University of Utah
We demonstrate how one can generate predictions for several thousand incidents of Latin American civil unrest, often many days in advance, by surfacing informative public posts available on Twitter and Tumblr.
The data mining system presented here runs daily and requires no manual intervention. Identification of informative posts is accomplished by applying multiple textual and geographic filters to a high-volume data feed consisting of tens of millions of posts per day which have been flagged as public by their authors. Predictions are built by annotating the filtered posts, typically a few dozen per day, with demographic, spatial, and temporal information.
Key to our textual filters is the fact that social media posts are necessarily short, making it possible to easily infer topic by simply searching for comentions of typically unrelated terms within the same post (e.g. a future date comentioned with an unrest keyword). Additional textual filters then proceed by applying a logistic regression classifier trained to recognize accounts belonging to organizations who are likely to announce civil unrest.
Geographic filtering is accomplished despite sparsely available GPS information and without relying on sophisticated natural language processing. A geocoding technique which infers non-GPS-known user locations via the locations of their GPS-known friends provides us with location estimates for 91,984,163 Twitter users at a median error of 6.65km. We show that announcements of upcoming events tend to localize within a small geographic region, allowing us to forecast event locations which are not explicitly mentioned in text.
We annotate our forecasts with demographic information by searching the collected posts for demographic specific keywords generated by hand as well as with the aid of DBpedia.
Our system has been in production since December 2012 and, at the time of this writing, has produced 4,771 distinct forecasts for events across ten Latin American nations. Manual examination of 2,859 posts surfaced by our method revealed that only 108 were discussing topics unrelated to civil unrest. Examination of 2,596 forecasts generated between 2013-07-01 and 2013-11-30 found 1,192 (45.9%) matched exactly the date and within a 100 km radius of a civil unrest event reported in traditional news media.
Direct to Full Text Article (10 pages; PDF)
Geotagging One Hundred Million Twitter Accounts with Total Variation Minimization (Version 2 via arXiv).
Thomas Fox-Brewster has posted a summary of the research on Forbes.com.
About Gary Price
Gary Price (firstname.lastname@example.org) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.