January 25, 2022

New From Cross Ref Labs: First Public Release of “pdf-extract”

From the CrossTech Blog:

CrossRef Labs is happy to announce the first public release of “pdf-extract” an open source set of tools and libraries for extracting citation references (and, eventually, other semantic metadata) from PDFs. We first demonstrated this tool to CrossRef members at our annual meeting last year. See the pdf-extract labs page for a detailed introduction to this new set of tools.

The blog post adds that if you’re unable to download the software, Extracto, a web-based resource from CrossRef Labs is available to extract citations from PDF files. However, the blog post also says that Extracto is, “running on very feeble server using an erratic and slow internet connection.”

In their words:

The only guarantee that we can make about using it is that it will repeatedly fall over and annoy you. The weasel has spoken.

About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.



  1. […]   New From Cross Ref Labs: First Public Release of “pdf-extract” […]