May 19, 2022

Research Paper and Video: “Comparing Published Scientific Journal Articles to Their Pre-print Versions”

The following preprint was recently posted to arXiv and has been accepted for publication at the Joint Conference on Digital Libraries (JCDL)2016.

The authors of the paper discussed some their findings at the Fall 2015 CNI Meeting in Washington D.C. A video of and slides from the presentation are found at the bottom of this post.


Comparing Published Scientific Journal Articles to Their Pre-print Versions

Martin Klein

Peter Broadwell

Sharon E. Farb

Todd Grappone


via arXiv


Academic publishers claim that they add value to scholarly communications by coordinating reviews and contributing and enhancing text during publication. These contributions come at a considerable cost: U.S. academic libraries paid $1.7 billion for serial subscriptions in 2008 alone. Library budgets, in contrast, are flat and not able to keep pace with serial price inflation. We have investigated the publishers’ value proposition by conducting a comparative study of pre-print papers and their final published counterparts. This comparison had two working assumptions: 1) if the publishers’ argument is valid, the text of a pre-print paper should vary measurably from its corresponding final published version, and 2) by applying standard similarity measures, we should be able to detect and quantify such differences. Our analysis revealed that the text contents of the scientific papers generally changed very little from their pre-print to final published versions. These findings contribute empirical indicators to discussions of the added value of commercial publishers and therefore should influence libraries’ economic decisions regarding access to scholarly publications.

Direct to Full Text Paper (10 pages; PDF)

Video of Presentation at CNI Fall 2015 Meeting (Washington DC)

Title: A Comparison of Published Scientific Journal Articles to Their Pre-print

Presentation Slides (PPT via CNI)

Preprint (arXiv) vs. Postprint Analysis Tool (arXcompare)

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.