January 24, 2022

Research Tools: A Milestone for Semantic Scholar

Ed. Note: We have been writing about, demonstrating, and using Semantic Scholar daily since day one. It’s an outstanding resource. Along with it’s many search capabilities the new research and citation alerts we receive via email regularly let us know about relevant materials we go on to share on infoDOCKET. –G.P.

From the PNW.ai Blog:

Even the most diligent scientists need a quick primer on the latest research. Which is why Semantic Scholar, the AI-powered platform for academic papers, can come in handy when you want to know the latest studies on, say, Covid-19 or Russian troll accounts. And this month, the rapidly-evolving search engine turns six, while also hitting another milestone: uploading 200 million papers to its archives. “Semantic Scholar is a poster child for AI2’s mission: AI for the Common Good,” says Oren Etzioni, CEO of the Allen Institute for AI, which created the project. “When we launched it, we had no idea that it would serve upwards of 8 million users per month just a few years later.”

What began in 2015 as a database for some 3 million computer science papers has recently grown into much more. Along with adding neuroscience papers, then biomedicine, then all fields of science in 2019, the platform last year launched the CORD-19 dataset and paper, a comprehensive dataset of more than 300,000 full-text Covid-19-related papers, and more than 840,000 metadata entries in total, that’s available to anyone, thus facilitating further research on the pandemic. To date, this largest single collection — with some articles on coronaviruses that would otherwise languish behind paywalls and others that date back to the 1950s — has been downloaded more than 200,000 times, and has become the basis of the most popular Kaggle competition ever.

Semantic Scholar uses the latest machine learning and natural language processing techniques to automatically “read” emerging papers, analyze their content, and extract their contributions and limitations, saving researchers and reporters untold hours poring over text. Acting as a researcher’s Spotify, the platform also recommends to each scientist papers it thinks they’ll find interesting, then improves its matching abilities based on the researcher’s actions. Recently, the platform introduced a new feature that automatically summarizes each paper in its archive, creating a one-sentence “TL;DR” summary to answer the time-sucking question vexing every researcher: To read or not to read potentially relevant papers.

Learn More, Read the Complete Blog Post

See Also: The infoDOCKET Post Announcing the Launch of Semantic Scholar From November 2, 2015

About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.