Research Article (preprint): “The Ideation–Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas”

June 28, 2025 by Gary Price

The research article (preprint) linked below was recently posted on arXiv.

Title

The Ideation–Execution Gap: Execution Outcomes of LLM-Generated Versus Human Research Ideas

Authors

Chenglei Si
Stanford University

Tatsunori Hashimoto
Stanford University

Diyi Yang
Stanford University

Source

via arXiv

DOI: 10.48550/arXiv.2506.20803

Abstract

Large Language Models (LLMs) have shown promise in accelerating the scientific research pipeline. A key capability for this process is the ability to generate novel research ideas, and prior studies have found settings in which LLM-generated research ideas were judged as more novel than human-expert ideas. However, a good idea should not simply appear to be novel, it should also result in better research after being executed. To test whether AI-generated ideas lead to better research outcomes, we conduct an execution study by recruiting 43 expert researchers to execute randomly-assigned ideas, either written by experts or generated by an LLM. Each expert spent over 100 hours implementing the idea and wrote a 4-page short paper to document the experiments. All the executed projects are then reviewed blindly by expert NLP researchers. Comparing the review scores of the same ideas before and after execution, the scores of the LLM-generated ideas decrease significantly more than expert-written ideas on all evaluation metrics (novelty, excitement, effectiveness, and overall; p <0.05), closing the gap between LLM and human ideas observed at the ideation stage. When comparing the aggregated review scores from the execution study, we even observe that for many metrics there is a flip in rankings where human ideas score higher than LLM ideas. This ideation-execution gap highlights the limitations of current LLMs in generating truly effective research ideas and the challenge of evaluating research ideas in the absence of execution outcomes.

Direct to Abstract + Link to Full Text

Filed under: Journal Articles, News

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.

Research Article (preprint): “The Ideation–Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas”

About Gary Price

Archives

FOLLOW US ON X

Research Article (preprint): “The Ideation–Execution Gap: Execution Outcomes of LLM-Generated versus Human Research Ideas”

About Gary Price

Archives

Related Infodocket Posts

FOLLOW US ON X