May 28, 2022

New Article: “Mining and Analysing Invoice Data From Elsevier Relative To Hybrid Open Access”

From an article on the Scholarly Communication Analytics with R Blog by  Najko Jahh, State and University Library Göttingen:


Mining and Analysing Invoice Data From Elsevier Relative To Hybrid Open Access


Publishers rarely make publication fee spending for hybrid journals transparent. Elsevier is a remarkable exception, as the publisher provides open and machine-readable data relative to its central invoicing with funding bodies and fee waivers at the article level. This blogpost illustrates how to mine Elsevier full-texts for these data with the data science tool R and presents new insights by analysing the resulting dataset: of 70,657 articles published open access in 1,753 hybrid journals from 2015 to date, around one third of the publication fees were paid through central agreements. Nevertheless, the majority of funding sources for hybrid open access remains unclear.

Direct to Full Text Article (approx. 3000 words)

Hat Tip: CAUL News

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.