SUBSCRIBE
SUBSCRIBE
EXPLORE +
  • About infoDOCKET
  • Academic Libraries on LJ
  • Research on LJ
  • News on LJ
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Libraries
    • Academic Libraries
    • Government Libraries
    • National Libraries
    • Public Libraries
  • Companies (Publishers/Vendors)
    • EBSCO
    • Elsevier
    • Ex Libris
    • Frontiers
    • Gale
    • PLOS
    • Scholastic
  • New Resources
    • Dashboards
    • Data Files
    • Digital Collections
    • Digital Preservation
    • Interactive Tools
    • Maps
    • Other
    • Podcasts
    • Productivity
  • New Research
    • Conference Presentations
    • Journal Articles
    • Lecture
    • New Issue
    • Reports
  • Topics
    • Archives & Special Collections
    • Associations & Organizations
    • Awards
    • Funding
    • Interviews
    • Jobs
    • Management & Leadership
    • News
    • Patrons & Users
    • Preservation
    • Profiles
    • Publishing
    • Roundup
    • Scholarly Communications
      • Open Access

May 17, 2025 by Gary Price

Report: “Historians Use Data Science to Mine The Past”

May 17, 2025 by Gary Price

From a Recently Published Article From PNAS (Proceedings of the National Academy of Sciences of the United States of America) by Carolyn Beans:

Historians today can perform computational tasks as they dig for insights—using, for example, computational models to reveal how often two words appear together in texts, launching network analyses to link individuals who appear in the same documents, or training computer vision models to recognize key features on digitized maps.

With these tools in hand, historians ask questions that would be difficult to answer with a lifetime of scrolling through microfiche. When and where did the originators of the anti-government expansion movement in the United States first meet? Who led the spy group that sent intel to the United States from Cuba in the late 1800s?

But practitioners of this field, known as digital history, understand that they are working with incomplete data. Most records have yet to be digitized, and many that are contain biases. And there are dangers in mining words from decades past without a deep understanding of their context.

[Clip]

One challenge is that databases to which scholars turn with these new methods may be incomplete in ways that aren’t obvious. For example, a newspaper might be included in a database, but if OCR didn’t capture its pages properly, the text wouldn’t be searchable.

Case in point: In 2023, research led by historians at The Alan Turing Institute in London, United Kingdom, revealed that searching the British Library’s digitized newspaper collection for information about life during the Industrial Revolution would return politically biased results. The reason: OCR was better at reading the fonts favored by more expensive and conservative papers than those used by less expensive, liberal ones (4).

Even when OCR works as planned, databases only capture a sliver of archival records worldwide, though those numbers are steadily growing, [historian Jo] Guldi says. In the United States, as of 2023, the Library of Congress had digitized more than 9 million records, including newspapers, manuscripts, and the personal papers of every president from George Washington to Calvin Coolidge (5).

Learn More, Read the Complete Article (about 2300 words)

Source: 10.1073/pnas.2508428122

Filed under: Conference Presentations, Data Files, Journal Articles, Libraries, Maps, News

SHARE:

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.

ADVERTISEMENT

Archives

Job Zone

ADVERTISEMENT

Related Infodocket Posts

ADVERTISEMENT

FOLLOW US ON X

Tweets by infoDOCKET

ADVERTISEMENT

This coverage is free for all visitors. Your support makes this possible.

This coverage is free for all visitors. Your support makes this possible.

Primary Sidebar

  • News
  • Reviews+
  • Technology
  • Programs+
  • Design
  • Leadership
  • People
  • COVID-19
  • Advocacy
  • Opinion
  • INFOdocket
  • Job Zone

Reviews+

  • Booklists
  • Prepub Alert
  • Book Pulse
  • Media
  • Readers' Advisory
  • Self-Published Books
  • Review Submissions
  • Review for LJ

Awards

  • Library of the Year
  • Librarian of the Year
  • Movers & Shakers 2022
  • Paralibrarian of the Year
  • Best Small Library
  • Marketer of the Year
  • All Awards Guidelines
  • Community Impact Prize

Resources

  • LJ Index/Star Libraries
  • Research
  • White Papers / Case Studies

Events & PD

  • Online Courses
  • In-Person Events
  • Virtual Events
  • Webcasts
  • About Us
  • Contact Us
  • Advertise
  • Subscribe
  • Media Inquiries
  • Newsletter Sign Up
  • Submit Features/News
  • Data Privacy
  • Terms of Use
  • Terms of Sale
  • FAQs
  • Careers at MSI


© 2026 Library Journal. All rights reserved.


© 2022 Library Journal. All rights reserved.