Research Preprint: “‘I Updated the ref’: The Evolution of References in the English Wikipedia and the Implications for Altmetrics”
The following research preprint was recently shared on arXiv.
GESIS – Leibniz Institute for the Social Sciences
With this work, we present a publicly available dataset of the history of all the references (more than 55 million) ever used in the English Wikipedia until June 2019. We have applied a new method for identifying and monitoring references in Wikipedia, so that for each reference we can provide data about associated actions: creation, modifications, deletions, and reinsertions. The high accuracy of this method and the resulting dataset was confirmed via a comprehensive crowdworker labelling campaign. We use the dataset to study the temporal evolution of Wikipedia references as well as users’ editing behaviour. We find evidence of a mostly productive and continuous effort to improve the quality of references: (1) there is a persistent increase of reference and document identifiers (DOI, PubMedID, PMC, ISBN, ISSN, ArXiv ID), and (2) most of the reference curation work is done by registered humans (not bots or anonymous editors). We conclude that the evolution of Wikipedia references, including the dynamics of the community processes that tend to them should be leveraged in the design of relevance indexes for altmetrics, and our dataset can be pivotal for such effort.
Direct to Full Text Article
58 pages; PDF.
About Gary Price
Gary Price (firstname.lastname@example.org) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.