A study posted on bioRxiv found that text mining full articles gave significantly better information that mining abstracts only, as expected. However, the authors of this study described challenges in the way content was presented and in the need to obtain copyright permissions. In addition to content availability and license status, support for early adopters and training for future practitioners are also cited as barriers to broad use of TDM for research purposes.
Source: Slogan on PLOS Text and Data Mining Page (Nov. 28, 2017)
The foundational value of CC BY licensing for TDM is that no additional permissions or documentation are required. Open Access facilitates TDM:
not on case-by-case basis, but for all people, in all places, and at all times
without lengthy legal agreements or restrictions
by providing unrestricted reuse, remix and mining rights
With more than 200,000 fully Open Access research articles available for content mining, PLOS can help advance the discussion and application of content mining through real-world experiences.
Through our API we provide article text and meta-data in a single XML file format according to the Journal Article Tag Suite (JATS), the National Information Standards Organization (NISO) standard tag suite for archiving and exchanging journal article content.
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area.
He earned his MLIS degree from Wayne State University in Detroit.
Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.