Report: “A New Tool for Deep-Down Data Mining” (GeoDeepDive)
From EOS (American Geophysical Union):
GeoDeepDive combines library science, computer science, and geoscience to dive into repositories of published text, tables, and figures and return valuable information.
Scientific publications contain measurements, descriptions, and images that have utility beyond the aims of the original work, particularly when they are aggregated into databases. For example, the Paleobiology Database contains field- and museum-based descriptions of more than 1.3 million fossil occurrences compiled from some 50,000 references, and sample-based geochemical data from the published literature are available in EarthChem. Both databases can be used to address fundamental scientific questions, but neither is complete. Plus, adding to existing literature-based data syntheses and constructing new ones is difficult and can be prohibitively time-consuming.
The primary goal of our U.S. National Science Foundation EarthCube building block project, GeoDeepDive, is to facilitate the creation and augmentation of literature-derived databases and to leverage published knowledge and past investments in data acquisition. The project combines library science (the aggregation and curation of digital documents and bibliographic metadata), geoscience (the generation of research questions and labeling of terms in externally managed scientific ontologies), and computer science (the use of high-throughput computing infrastructure and machine reading systems to parse and extract data from millions of documents)
Much More (including several figures) in the Complete Article (approx. 1800 words)
Filed under: Data Files, Libraries, News, Open Access
About Gary Price
Gary Price (firstname.lastname@example.org) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.