Cool! NIH Manuscript Collection Now Optimized and Available for Text-Mining and More
NIH-supported scientists have made over 300,000 author manuscripts available on PubMed Central (PMC) since 2008. Now, NIH is making these papers accessible to the public in a format that will allow robust text analyses.
You can download the entire PMC collection of NIH-supported author manuscripts as a package in either XML or plain text formats.
The collection will encompass all NIH manuscripts posted to PMC since July 2008. While the public can access the articles’ full text and accompanying figures, tables, and multimedia on the PMC Web site, the newly available article packages include full text only, in a form that facilitates text-mining.
We developed this resource to increase the impact of NIH funding. Through this collection, scientists will be able to analyze these manuscripts, further apply the findings of NIH research, and generate new discoveries.
The PMC Author Manuscript Collection consists of articles in author manuscript form that have been made available in PMC in compliance with the NIH Public Access Policy or similar policies of other funders. The text of manuscripts in the Collection may be downloaded in XML and plain text formats. These files are available for text mining. They may also be used consistent with the principles of fair use under the copyright law.
About Gary Price
Gary Price (email@example.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.