January 18, 2022

New Article: “Text Mining Resources for the Life Sciences”

The following full text article (open access) was recently published in Database: The Journal of Biological Databases and Curation.


Text Mining Resources for the Life Sciences


Piotr Przybyła
University of Manchester

Matthew Shardlow
University of Manchester (UK)

Sophie Aubin
Institut National de la Recherche Agronomique (France)

Robert Bossy
Institut National de la Recherche Agronomique (France)

Richard Eckart de Castilho
Technische Universität Darmstadt (Germany)

Stelios Piperidis
Athena Research Center (Greece)

John McNaught
University of Manchester (UK)

Sophia Ananiadou
University of Manchester (UK)


Database: The Journal of Biological Databases and Curation
2016: baw145doi: 10.1093/database/baw145


Text mining is a powerful technology for quickly distilling key information from vast quantities of biomedical literature. However, to harness this power the researcher must be well versed in the availability, suitability, adaptability, interoperability and comparative accuracy of current text mining resources.

In this survey, we give an overview of the text mining resources that exist in the life sciences to help researchers, especially those employed in biocuration, to engage with text mining in their own work.

We categorize the various resources under three sections: Content Discovery looks at where and how to find biomedical publications for text mining; Knowledge Encoding describes the formats used to represent the different levels of information associated with content that enable text mining, including those formats used to carry such information between processes; Tools and Services gives an overview of workflow management systems that can be used to rapidly configure and compare domain- and task-specific processes, via access to a wide range of pre-built tools.

We also provide links to relevant repositories in each section to enable the reader to find resources relevant to their own area of interest. Throughout this work we give a special focus to resources that are interoperable—those that have the crucial ability to share information, enabling smooth integration and reusability.

Direct to Full Text Article ||| PDF Version (30 pages)

About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.