From the CrossTech Blog:
CrossRef Labs is happy to announce the first public release of “pdf-extract” an open source set of tools and libraries for extracting citation references (and, eventually, other semantic metadata) from PDFs. We first demonstrated this tool to CrossRef members at our annual meeting last year. See the pdf-extract labs page for a detailed introduction to this new set of tools.
The blog post adds that if you’re unable to download the software, Extracto, a web-based resource from CrossRef Labs is available to extract citations from PDF files. However, the blog post also says that Extracto is, “running on very feeble server using an erratic and slow internet connection.”
In their words:
The only guarantee that we can make about using it is that it will repeatedly fall over and annoy you. The weasel has spoken.
[…] New From Cross Ref Labs: First Public Release of “pdf-extract” […]