SUBSCRIBE
SUBSCRIBE
EXPLORE +
  • About infoDOCKET
  • Academic Libraries on LJ
  • Research on LJ
  • News on LJ
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Libraries
    • Academic Libraries
    • Government Libraries
    • National Libraries
    • Public Libraries
  • Companies (Publishers/Vendors)
    • EBSCO
    • Elsevier
    • Ex Libris
    • Frontiers
    • Gale
    • PLOS
    • Scholastic
  • New Resources
    • Dashboards
    • Data Files
    • Digital Collections
    • Digital Preservation
    • Interactive Tools
    • Maps
    • Other
    • Podcasts
    • Productivity
  • New Research
    • Conference Presentations
    • Journal Articles
    • Lecture
    • New Issue
    • Reports
  • Topics
    • Archives & Special Collections
    • Associations & Organizations
    • Awards
    • Funding
    • Interviews
    • Jobs
    • Management & Leadership
    • News
    • Patrons & Users
    • Preservation
    • Profiles
    • Publishing
    • Roundup
    • Scholarly Communications
      • Open Access

December 12, 2024 by Gary Price

A New Research Initiative From the Harvard Law School Library: The Institutional Data Initiative (IDI) Launches Today

December 12, 2024 by Gary Price

From the IDI:

Today we’re launching the Institutional Data Initiative (IDI), a research initiative at the Harvard Law School Library. IDI is dedicated to supporting our peers as they steward humanity’s knowledge and seek to provide the broadest access to it in the age of AI, just as they’ve done for so much media over centuries, and across the technological revolutions within them.

IDI comprises a growing team of data scientists and community builders, first incubated at the Library Innovation Lab. We’ll collaborate with knowledge institutions—from libraries and universities to cultural groups and government agencies—to help structure, analyze, and publish their collections as data for all uses, including AI. We’ll work to develop AI-driven tools to scale and accelerate this work, evaluations to study its impacts, and best practices to foster responsible data use while affirming institutional stewardship.

Our initial activities include refining a collection of nearly one million public domain books, scanned at Harvard Library; a collaboration with Boston Public Library to make available millions of pages from hard-to-find historical newspapers; and a spring symposium hosted at Harvard Law School to build connections and explore areas of alignment between the institutional and AI communities.

[Clip]

At launch, we have data from nearly one million public domain books, scanned at Harvard Library as part of the Google Books project. Our structuring and analysis of the corpus is complete and we’re working with Google to release this treasure trove far and wide.

We’re also collaborating with Boston Public Library as they scan millions of pages from public domain newspapers. The layouts of newspapers make extracting their text notoriously difficult, so we’re applying new methods to increase accuracy and accessibility. Once extracted, we’ll research the impact this data has on the behavior and information recall of AI models so that other institutions can better understand the potential of their own collections.

Read the Complete Launch Announcement (about 1500 words) ||| Archived Version (via The Wayback Machine)

See Also Harvard’s Library Innovation Lab Launches Institutional Data Initiative (Interview with IDI Exec. Director, Greg Leppert)

See Also: Supporting New Open Data Initiatives: Institutional Data Initiative and CORE (via Microsoft)

Media Coverage

  • Harvard and Google to Release 1 Million Public-Domain Books as AI Training Dataset (via TechCrunch)
  • Harvard is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft (via WIRED)

Updated 6/12/2025

Filed under: Data Files, Interviews, Libraries, News, Profiles, Public Libraries, School Libraries

SHARE:

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.

ADVERTISEMENT

Archives

Job Zone

ADVERTISEMENT

Related Infodocket Posts

ADVERTISEMENT

FOLLOW US ON X

Tweets by infoDOCKET

ADVERTISEMENT

This coverage is free for all visitors. Your support makes this possible.

This coverage is free for all visitors. Your support makes this possible.

Primary Sidebar

  • News
  • Reviews+
  • Technology
  • Programs+
  • Design
  • Leadership
  • People
  • COVID-19
  • Advocacy
  • Opinion
  • INFOdocket
  • Job Zone

Reviews+

  • Booklists
  • Prepub Alert
  • Book Pulse
  • Media
  • Readers' Advisory
  • Self-Published Books
  • Review Submissions
  • Review for LJ

Awards

  • Library of the Year
  • Librarian of the Year
  • Movers & Shakers 2022
  • Paralibrarian of the Year
  • Best Small Library
  • Marketer of the Year
  • All Awards Guidelines
  • Community Impact Prize

Resources

  • LJ Index/Star Libraries
  • Research
  • White Papers / Case Studies

Events & PD

  • Online Courses
  • In-Person Events
  • Virtual Events
  • Webcasts
  • About Us
  • Contact Us
  • Advertise
  • Subscribe
  • Media Inquiries
  • Newsletter Sign Up
  • Submit Features/News
  • Data Privacy
  • Terms of Use
  • Terms of Sale
  • FAQs
  • Careers at MSI


© 2026 Library Journal. All rights reserved.


© 2022 Library Journal. All rights reserved.