SUBSCRIBE
SUBSCRIBE
EXPLORE +
  • About infoDOCKET
  • Academic Libraries on LJ
  • Research on LJ
  • News on LJ
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Libraries
    • Academic Libraries
    • Government Libraries
    • National Libraries
    • Public Libraries
  • Companies (Publishers/Vendors)
    • EBSCO
    • Elsevier
    • Ex Libris
    • Frontiers
    • Gale
    • PLOS
    • Scholastic
  • New Resources
    • Dashboards
    • Data Files
    • Digital Collections
    • Digital Preservation
    • Interactive Tools
    • Maps
    • Other
    • Podcasts
    • Productivity
  • New Research
    • Conference Presentations
    • Journal Articles
    • Lecture
    • New Issue
    • Reports
  • Topics
    • Archives & Special Collections
    • Associations & Organizations
    • Awards
    • Funding
    • Interviews
    • Jobs
    • Management & Leadership
    • News
    • Patrons & Users
    • Preservation
    • Profiles
    • Publishing
    • Roundup
    • Scholarly Communications
      • Open Access

February 4, 2026 by Gary Price

Journal Article: Synthesizing Scientific Literature with Retrieval-Augmented Language Models

February 4, 2026 by Gary Price

The paper linked below was published today by Nature.

It discusses the work by Ai2 and the University of Washington to develop OpenScholar (now part of Asta) that we’ve been posting about  around here since it launched (as a prototype) in late 2024.

Title

Synthesizing Scientific Literature With Retrieval-Augmented Language Models

Authors

Akari Asai, Jacqueline He, Rulin Shao, Weijia Shi, Amanpreet Singh, Joseph Chee Chang, Kyle Lo, Luca Soldaini, Sergey Feldman, Mike D’Arcy, David Wadden, Matt Latzke, Jenna Sparks, Jena D. Hwang, Varsha Kishore, Minyang Tian, Pan Ji, Shengyan Liu, Hao Tong, Bohao Wu, Yanyu Xiong, Luke Zettlemoyer, Graham Neubig, Daniel S. Weld, Hannaneh Hajishirzi

Source

Nature (2026)
DOI: 10.1038/s41586-025-10072-4

Abstract

Scientific progress depends on the ability of researchers to synthesize the growing body of literature. Can large language models (LLMs) assist scientists in this task? Here we introduce OpenScholar, a specialized retrieval-augmented language model (LM) that answers scientific queries by identifying relevant passages from 45 million open-access papers and synthesizing citation-backed responses. To evaluate OpenScholar, we develop ScholarQABench, the first large-scale multi-domain benchmark for literature search, comprising 2,967 expert-written queries and 208 long-form answers across computer science, physics, neuroscience and biomedicine. Despite being a smaller open model, OpenScholar-8B outperforms GPT-4o by 6.1% and PaperQA2 by 5.5% in correctness on a challenging multi-paper synthesis task from the new ScholarQABench. Although GPT-4o hallucinates citations 78–90% of the time, OpenScholar achieves citation accuracy on par with human experts. OpenScholar’s data store, retriever and self-feedback inference loop improve off-the-shelf LMs: for instance, OpenScholar-GPT-4o improves the correctness of GPT-4o by 12%. In human evaluations, experts preferred OpenScholar-8B and OpenScholar-GPT-4o responses over expert-written ones 51% and 70% of the time, respectively, compared with 32% for GPT-4o. We open-source all artefacts, including our code, models, data store, datasets and a public demo.

Direct to Full Text Article

UPDATE: Post by Ai2: “OpenScholar Has Been Accepted to Nature”

OpenScholar pairs a model trained for scientific synthesis with retrieval-augmented generation (RAG). This allows it to search a large scientific corpus, incorporate relevant papers (including newer ones), and cite sources for the claims it makes. To ground answers in the literature, we constructed a corpus of 45 million open-access scientific papers and developed a full-text snippet index for OpenScholar to retrieve from, which we later made available through the Semantic Scholar API.

We also built ScholarQABench, the first large, multi-domain benchmark for evaluating systems on scientific synthesis and citation quality. ScholarQA-CS, the computer science portion of ScholarQABench, later evolved into ScholarQA-CS2, the long-form scientific QA benchmark now included in AstaBench.

Read the Complete Post

Media Coverage

  • AI Tool Beats Giant LLMs In Literature Reviews — and Gets Citations Right (via Nature)

In the 14 months since OpenScholar was first published in the arXiv repository2, AI firms such as OpenAI have used similar methods to tack ‘deep research’ tools onto their commercial LLMs, which has greatly improved their accuracy. But as a small and efficient system, running OpenScholar costs a fraction of the price of using OpenAI’s GPT-5 with deep research, co-author Hannaneh Hajishirzi, a computer scientist at the University of Washington in Seattle, tells the Nature podcast.

However, the authors acknowledge that OpenScholar has limitations. For example, it doesn’t always retrieve the most representative or relevant papers for a query, and it is limited by the scope of its database.

But if researchers are able to access the tool for free, “it can become one of the most popular apps for scientific searches,” says Mushtaq Bilal, a researcher at Silvi, a Copenhagen-based firm that has its own AI-based literature-review tool.

Filed under: Data Files, Journal Articles, News, Open Access

SHARE:

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.

ADVERTISEMENT

Archives

Job Zone

ADVERTISEMENT

Related Infodocket Posts

ADVERTISEMENT

FOLLOW US ON X

Tweets by infoDOCKET

ADVERTISEMENT

This coverage is free for all visitors. Your support makes this possible.

This coverage is free for all visitors. Your support makes this possible.

Primary Sidebar

  • News
  • Reviews+
  • Technology
  • Programs+
  • Design
  • Leadership
  • People
  • COVID-19
  • Advocacy
  • Opinion
  • INFOdocket
  • Job Zone

Reviews+

  • Booklists
  • Prepub Alert
  • Book Pulse
  • Media
  • Readers' Advisory
  • Self-Published Books
  • Review Submissions
  • Review for LJ

Awards

  • Library of the Year
  • Librarian of the Year
  • Movers & Shakers 2022
  • Paralibrarian of the Year
  • Best Small Library
  • Marketer of the Year
  • All Awards Guidelines
  • Community Impact Prize

Resources

  • LJ Index/Star Libraries
  • Research
  • White Papers / Case Studies

Events & PD

  • Online Courses
  • In-Person Events
  • Virtual Events
  • Webcasts
  • About Us
  • Contact Us
  • Advertise
  • Subscribe
  • Media Inquiries
  • Newsletter Sign Up
  • Submit Features/News
  • Data Privacy
  • Terms of Use
  • Terms of Sale
  • FAQs
  • Careers at MSI


© 2026 Library Journal. All rights reserved.


© 2022 Library Journal. All rights reserved.