May 24, 2022

New Report from Publishing Research Consortium: "Journal Article Mining: A Research Study into Practices, Policies, Plans. . . and Promises"

Title:  “Journal Article Mining: A Research study into Practices, Policies, Plans. . . and Promises”
Authors: by Eefke Smit, STM  and Maurits van der Graaf.
Source: Publishers Research Council

 This is a study commissioned by PRC which offers the first comprehensive look at what publishers and others are doing, and plan to do, in both data and text mining of the scholarly, mainly journal, literature. 29 interviews and 190 detailed responses to a survey. (via STM Web Site)

Key Findings:

  • Content mining is about to accelerate, will expand into new areas and develop further into automated information extraction and relationship analysis
  • The focus is shifting from the traditional life sciences (especially drug discovery) to the social sciences, humanities, business, marketing and even law
  • A majority of respondents to the survey supported three common solutions for facilitating content mining
  • More content standardization for mining‐friendly formats
  • A shared content mining platform across publishers
  • Commonly agreed rules for the granting of mining permissions
  • Third‐party mining requests are received by most publishers (77% of all, 88% of large ones) but at a very low level (less than 10 per annum); most mining requests come from abstracting and indexing services followed by corporate R&D organisations.
  • Over 90 % of publisher respondents grant research‐focused mining requests, nearly 60 % of these in all or the majority of cases. The request will be granted by 60% of publisher respondents in most or all cases if it creates traffic drivers to their sites but just over half of these publishers (51%) will refuse in all or most cases if the results of the mining would compete with their own services
  • A majority of publishers do not see Open Access as a prerequisite for content mining

Direct to Summary/News Release (PDF)

Direct to Full Text (153 pages; PDF)

About PRC

The PRC is a group representing publishers and associations supporting global research into scholarly communication in order to enable evidence-based discussion and objective analysis ( PRC’s objective is to support work that is scientific and pro-scholarship, in order to promote an understanding of the role of publishing and its impact on research and teaching.

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.