New From Allen Institute For Artificial Intelligence (Ai2): Understanding Usage and Engagement in AI-Powered Scientific Research Tools: The Asta Interaction Dataset (preprint)
The preprint linked below was recently shared on arXiv.
Title
Authors
Dany Haddad, Dan Bareket, Joseph Chee Chang, Jay DeYoung, Jena D. Hwang, Uri Katz, et al.
Source
via arXiv
Abstract
AI-powered scientific research tools are rapidly being integrated into research workflows, yet the field lacks a clear lens into how researchers use these systems in real-world settings. We present and analyze the Asta Interaction Dataset, a large-scale resource comprising over 200,000 user queries and interaction logs from two deployed tools (a literature discovery interface and a scientific question-answering interface) within an LLM-powered retrieval-augmented generation platform. Using this dataset, we characterize query patterns, engagement behaviors, and how usage evolves with experience. We find that users submit longer and more complex queries than in traditional search, and treat the system as a collaborative research partner, delegating tasks such as drafting content and identifying research gaps. Users treat generated responses as persistent artifacts, revisiting and navigating among outputs and cited evidence in non-linear ways. With experience, users issue more targeted queries and engage more deeply with supporting citations, although keyword-style queries persist even among experienced users. We release the anonymized dataset and analysis with a new query intent taxonomy to inform future designs of real-world AI research assistants and to support realistic evaluation.
UPDATE Feb. 27 (Ai2 Blog Post About the Dataset/Paper)
How Do Researchers Actually use AI-Powered Science Tools? Lessons from 250,000+ Queries
From an Ai2 Blog Post:
Queries are longer, more complex, and more demanding
Users of AI-powered tools submit dramatically longer and more complex queries compared to those submitted to traditional academic search engines:
Metric PaperFinder ScholarQA Semantic Scholar
(traditional)Avg. constraints per query 0.60 0.82 0.15 Avg. entities per query 4.00 5.14 2.25 Avg. relations per query 2.17 2.68 1.20 Avg. query length (words) 17.04 36.96 5.35 [Clip]
Beyond keywords: what researchers actually type into AI tools
Some of the most revealing findings came from simply reading what users type into the search box. Beyond the standard taxonomy, we found query patterns that show users probing the boundaries of what AI research tools can do. These behaviors reflect phrasing strategies shaped by general-purpose LLMs:
PF-PaperFinder
SQA-ScholarQA
Pattern Tool Example Query Why It’s Interesting Template Filling PF “fill this tabel with 10 jurnal bellow:…” [table template with citations] Users paste structured templates (tables, forms) and expect the AI to populate them with literature data—treating the research tool as a data entry assistant. Template Filling SQA “for sacubitril find all: ‘IUPAC Name: CAS Number: Molecular Formula:…’” [15+ fields] Users submit structured extraction tasks with 15+ fields, expecting the tool to act as a fact-extraction pipeline over the literature. Explicit Prompting SQA “You are an expert research assistant specializing in computational geosciences and machine learning.” Users apply prompt engineering techniques (system prompts, persona assignment) learned from general-purpose LLMs, even though our tool doesn’t support custom system prompts. Explicit Prompting PF “Find papers…The model must be capable of…” Users use markdown-style emphasis (bold, caps) to stress constraints, revealing expectations shaped by conversational AI. Persona Adoption SQA “Think of yourself as experienced professor…Please write me a phd proposal…devour Turnitin detection bots” Some users ask the tool to adopt an expert persona and even attempt to circumvent plagiarism detection—a behavior shaped by general-purpose LLM interactions. Collaborative Writing SQA “I’m working on my paper…” [LaTeX section] “add papers from TSE, TOSEM, ICSE” Users paste their in-progress LaTeX manuscripts and ask the tool to find and insert citations from specific venues—using it as a collaborative writing partner. Research Lineage PF “What are latest advances in research fields of these three papers?” [3 DOIs] Users paste DOIs and ask the tool to trace the research lineage forward, treating it as a citation graph explorer. Refinding PF “hey whats the name of the paper that did a study on how people use llms by allowing the public to use their tokens on paid llms…” Users describe half-remembered papers in conversational language, using the tool as a “tip-of-my-tongue” paper finder—a task traditional search handles poorly. Refinding PF “…paper using BERT that says we cant just look at top-k…which paper says this” Users recall a specific claim from a paper they read before and ask the tool to identify the source—a sophisticated citation recovery task. These patterns reveal a key insight: users expect AI research tools to function as collaborative research partners with capabilities similar to general-purpose chatbots. They bring habits from general-purpose LLMs – such as prompt engineering, persona assignment, template filling, and collaborative writing – into a domain-specific platform. Some of these imported behaviors raise obvious concerns—the dataset includes queries that attempt to circumvent plagiarism detection. We include them because understanding how users actually behave, not just how we hope they behave, is the point.
How users engage with results
We also analyzed what users do after submitting a query. The engagement patterns differ sharply from traditional search.
Results as persistent artifacts
One of our most striking findings is that users treat AI-generated outputs as persistent artifacts rather than ephemeral search results. Over 50% of SQA users and 42% of PF users revisit previous reports—substantially more than the rate of near-duplicate query submission (~19% and ~15%, respectively). Users come back to their results hours or days later, suggesting they bookmark and reference these outputs as part of their ongoing research workflow. This has direct implications for how we think about generated content: if users are returning to these outputs, we need better ways to help them manage and build on past reports and, more critically, mechanisms for keeping them current as new literature appears.
Learn Much More, Read the Complete Post
Filed under: Data Files, Journal Articles, News, Patrons and Users, Reports
About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.


