Research Article (Preprint): “ChatGPT as Research Scientist: Probing GPT’s Capabilities as a Research Librarian, Research Ethicist, Data Generator and Data Predictor”

June 24, 2024 by Gary Price

The article (preprint) linked below was recently shared on arXiv.

Title

ChatGPT as Research Scientist: Probing GPT’s Capabilities as a Research Librarian, Research Ethicist, Data Generator and Data Predictor

Authors

Steven A. Lehr
Cangrade

Aylin Caliskan
University of Washington

Suneragiri Liyanage
Harvard University

Mahzarin R. Banaji
Harvard University

Source

via arXiv
Under revised review at PNAS

DOI

Abstract

How good a research scientist is ChatGPT? We systematically probed the capabilities of GPT-3.5 and GPT-4 across four central components of the scientific process: as a Research Librarian, Research Ethicist, Data Generator, and Novel Data Predictor, using psychological science as a testing field. In Study 1 (Research Librarian), unlike human researchers, GPT-3.5 and GPT-4 hallucinated, authoritatively generating fictional references 36.0% and 5.4% of the time, respectively, although GPT-4 exhibited an evolving capacity to acknowledge its fictions. In Study 2 (Research Ethicist), GPT-4 (though not GPT-3.5) proved capable of detecting violations like p-hacking in fictional research protocols, correcting 88.6% of blatantly presented issues, and 72.6% of subtly presented issues. In Study 3 (Data Generator), both models consistently replicated patterns of cultural bias previously discovered in large language corpora, indicating that ChatGPT can simulate known results, an antecedent to usefulness for both data generation and skills like hypothesis generation. Contrastingly, in Study 4 (Novel Data Predictor), neither model was successful at predicting new results absent in their training data, and neither appeared to leverage substantially new information when predicting more versus less novel outcomes. Together, these results suggest that GPT is a flawed but rapidly improving librarian, a decent research ethicist already, capable of data generation in simple domains with known characteristics but poor at predicting novel patterns of empirical data to aid future experimentation.

Direct to Access Full Text

Filed under: Data Files, News

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.

Research Article (Preprint): “ChatGPT as Research Scientist: Probing GPT’s Capabilities as a Research Librarian, Research Ethicist, Data Generator and Data Predictor”

About Gary Price

Archives

FOLLOW US ON X

Research Article (Preprint): “ChatGPT as Research Scientist: Probing GPT’s Capabilities as a Research Librarian, Research Ethicist, Data Generator and Data Predictor”

About Gary Price

Archives

Related Infodocket Posts

FOLLOW US ON X