May 26, 2022

Research Article (Preprint): “Comparison of Bibliographic Data Sources: Implications for the Robustness of University Rankings”

The following article (preprint) was recently posted on bioRxiv.


Comparison of Bibliographic Data Sources: Implications for the Robustness of University Rankings


Chun-Kai (Karl) Huang, Cameron Neylon, Chloe Brookes-Kenworthy, Richard Hosking, Lucy Montgomery, Katie Wilson, Alkim Ozaygen

Affiliation of All Authors: Curtin University, Australia


via: bioRxiv


Universities are increasingly evaluated, both internally and externally on the basis of their outputs. Often these are converted to simple, and frequently contested, rankings based on quantitative analysis of those outputs. These rankings can have substantial implications for student and staff recruitment, research income and perceived prestige of a university. Both internal and external analyses usually rely on a single data source to define the set of outputs assigned to a specific university. Although some differences between such databases are documented, few studies have explored them at the institutional scale and examined the implications of these differences for the metrics and rankings that are derived from them. We address this gap by performing detailed bibliographic comparisons between three key databases: Web of Science (WoS), Scopus and, the recently relaunched Microsoft Academic (MSA). We analyse the differences between outputs with DOIs identified from each source for a sample of 155 universities and supplement this with a detailed manual analysis of the differences for fifteen universities. We find significant differences between the sources at the university level. Sources differ in the publication year of specific objects, the completeness of metadata, as well as in their coverage of disciplines, outlets, and publication type. We construct two simple rankings based on citation counts and open access status of the outputs for these universities and show dramatic changes in position based on the choice of bibliographic data sources. Those universities that experience the largest changes are frequently those from non-English speaking countries and those that are outside the top positions in international university rankings. Overall MSA has greater coverage than Scopus or WoS, but has less complete affiliation metadata. We suggest that robust evaluation measures need to consider the effect of choice of data sources and recommend an approach where data from multiple sources is integrated to provide a more robust dataset.

Direct to Full Text Article
60 pages; PDF.

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.