SUBSCRIBE
SUBSCRIBE
EXPLORE +
  • About infoDOCKET
  • Academic Libraries on LJ
  • Research on LJ
  • News on LJ
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Libraries
    • Academic Libraries
    • Government Libraries
    • National Libraries
    • Public Libraries
  • Companies (Publishers/Vendors)
    • EBSCO
    • Elsevier
    • Ex Libris
    • Frontiers
    • Gale
    • PLOS
    • Scholastic
  • New Resources
    • Dashboards
    • Data Files
    • Digital Collections
    • Digital Preservation
    • Interactive Tools
    • Maps
    • Other
    • Podcasts
    • Productivity
  • New Research
    • Conference Presentations
    • Journal Articles
    • Lecture
    • New Issue
    • Reports
  • Topics
    • Archives & Special Collections
    • Associations & Organizations
    • Awards
    • Funding
    • Interviews
    • Jobs
    • Management & Leadership
    • News
    • Patrons & Users
    • Preservation
    • Profiles
    • Publishing
    • Roundup
    • Scholarly Communications
      • Open Access

January 28, 2022 by Gary Price

HathiTrust Research Center Receives NEH support for Open Research Tools

January 28, 2022 by Gary Price

From the School of Information Sciences, University of Illinois:

The HathiTrust Research Center (HTRC), cohosted by the iSchool at Illinois and the Luddy School of Informatics at Indiana University, has received a $325,000 Digital Humanities Advancement Grant from the National Endowment for the Humanities. One of 15 awarded nationwide, this grant will support the development of a new set of visualizations, analytical tools, and infrastructure to enable users to interact more directly with the rich data extracted from the HathiTrust Digital Library’s collection of more than 17.5 million digitized volumes.

The project, “Tools for Open Research and Computation with HathiTrust: Leveraging Intelligent Text Extraction” (TORCHLITE) will be led jointly by Professor J. Stephen Downie, associate dean for research at the iSchool and co-director of HTRC, and John Walsh, HTRC director and associate professor of information and library science at Indiana University. HTRC staff at both universities will collaborate during the next two years to accomplish TORCHLITE’s goals.

“TORCHLITE will enable us to increase dramatically, and to open more fully, public access to the massive, rich data that HTRC has created from the HathiTrust Digital Library corpus,” said Downie. “We have already developed innovative ways to transform, enhance, and provide access to data created from the many millions of scanned books held by HathiTrust. With TORCHLITE, we’ll create new methods for accessing this data, together with several easy-to-use tools to allow people to interact with it, analyze it, and visualize it in novel ways.”

The data of interest is contained in HTRC’s flagship “Extracted Features” (EF) dataset, which consists of rich metadata and statistical information inferred by algorithm from the digitized texts of the entire HathiTrust corpus and documents every word on every page, including the number of times the word appears, its part of speech, and other formal features of the language on the page. The EF dataset, and methods for computing over it, have enabled many forms of full-text analysis—even of copyrighted materials. The EF dataset contains nearly three trillion tokens (or in other words, words) representing more than six billion pages of text, making it arguably the largest open dataset of its kind that is readily available to researchers around the world.

In addition to creating new methods for dealing with this enormous data set, and perhaps more impactfully, Downie emphasized that TORCHLITE will develop a framework on which digital humanities scholars, digital librarians, data scientists, and anyone else interested in textual analysis, can build and implement their own tools, and deploy them in their own environments, in order to access the HTRC data more directly and openly. TORCHLITE’s tools and methods will enable the retrieval of standard volume-level descriptors—such as title, publisher, date of publication, genre, and page count—along with page- and word-level linguistic and statistical information.

In addition to creating interactive, easy-to-use tools and dashboards, TORCHLITE will promote broad community engagement through a workshop and a mentored hackathon in autumn 2023, in the hopes of encouraging individual researchers to develop their own tools using the project’s application programming interface (API).

HTRC is the official research arm of HathiTrust, a library consortium that hosts books owned and digitized by its member libraries, often in cooperation with the Google Books Project and other mass-digitization efforts. Its mission is to contribute to the common good by collecting, organizing, preserving, communicating, and sharing the record of human knowledge through data-intensive computational methods.

Filed under: Dashboards, Data Files, Digital Collections, Digital Preservation, Funding, Interactive Tools, Libraries, News, Patrons and Users, Publishing

SHARE:

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.

ADVERTISEMENT

Archives

Job Zone

ADVERTISEMENT

Related Infodocket Posts

Journal Article: "Libraries Advancing Health Equity: A Literature Review"

The article linked below (full-text) was recently published Reference Services Review. Title Libraries Advancing Health Equity: A Literature Review Authors Amanda J. Wilson National Library of Medicine Catherine Staley National ...

Chicago Sun-Times Editorial Board: "As Libraries Turn the Page on Bookmobiles, Something is Lost"

From the Chicago Sun-Times Editorial Board: Anyone who has spent time on a bookmobile has learned enough to know nothing withstands the change of time. Still, we lament the slow ...

LC's African and Middle Eastern Division Announces Release of the Africana Historic Postcard Collection

From The Library of Congress (via a 4 Corners of the World Blog Post by Anchi Hoh): The African and Middle Eastern Division is delighted to announce the rerelease of the ...

New From IFLA: "Marrakesh Monitoring Report - February 2023 Update"

From the International Federation of Library Associations and Institutions (IFLA): The chart [monitoring report]…is an updated version of previous monitoring reports. Where a country has been updated or added since ...

ROUNDUP: Research4Life Reaches 200,000 Resources; Majority of Research Papers Published by Cambridge University Press Now Open Access; &...

AI Models Spit Out Photos of Real People and Copyrighted Image (via MIT Technology Review) Association of Research Libraries (ARL) Seeks to Hire Director of Diversity, Equity, and Inclusion ChatGPT ...

NY Times: "Turning Nairobi’s Public Libraries Into 'Palaces for the People'"

From The NY Times: In 1931, the first library in Kenya’s capital, Nairobi, opened its doors — to white patrons only. Nearly a century later, Kenyans dressed in the slinky ...

UC Berkeley School of Law Library Reclassifies Indigenous Materials, Giving Them Their Own Place on the Shelves

From Berkeley Law: As part of its broader commitment to considering and fostering diversity and inclusion within its storied stacks, the Berkeley Law Library staff have taken on one prominent example of ...

Not Real News: An Associated Press Roundup of Untrue Stories Shared Widely on Social Media This Week

From the Associated Press: A roundup of some of the most popular but completely untrue stories and visuals of the week. None of these are legit, even though they were ...

A Selection of New or Recently Updated Reports From the Congressional Research Service

An Introduction to Trade Secrets Law in the United States Oil and Gas Technology and Geothermal Energy Development Regulating Big Tech: CRS Legal Products for the 118th Congress Rules and ...

Deepfakes are Becoming a Cottage Industry; STM US Annual Conference 2023 to Take Place in DC (April 26-27);...

Columbia: A Judge Just Used ChatGPT to Make a Court Decision (via VICE) Coming Soon: STM US Annual Conference 2023 to Take Place in DC (April 26-27) FCC Announces Over ...

New Journal Article: "Sustainability 3.0 in Libraries: A Challenge for Management"

The article linked below was published today (February 3, 2023). Title Sustainability 3.0 in Libraries: A Challenge for Management Author Alice Keller University Library Basel, University of Basel,  Switzerland Source ...

U.S. National Academy of Sciences and Nobel Foundation to Hold Nobel Prize Summit on Countering Misinformation and Building...

From a National Academies Announcement: The Nobel Prize Summit Truth, Trust and Hope will bring together Nobel Prize laureates and other world-renowned experts and leaders for a global dialogue on how to stop ...

ADVERTISEMENT

FOLLOW US ON TWITTER

Tweets by infoDOCKET

ADVERTISEMENT

This coverage is free for all visitors. Your support makes this possible.

This coverage is free for all visitors. Your support makes this possible.

Primary Sidebar

  • News
  • Reviews+
  • Technology
  • Programs+
  • Design
  • Leadership
  • People
  • COVID-19
  • Advocacy
  • Opinion
  • INFOdocket
  • Job Zone

Reviews+

  • Booklists
  • Prepub Alert
  • Book Pulse
  • Media
  • Readers' Advisory
  • Self-Published Books
  • Review Submissions
  • Review for LJ

Awards

  • Library of the Year
  • Librarian of the Year
  • Movers & Shakers 2022
  • Paralibrarian of the Year
  • Best Small Library
  • Marketer of the Year
  • All Awards Guidelines
  • Community Impact Prize

Resources

  • LJ Index/Star Libraries
  • Research
  • White Papers / Case Studies

Events & PD

  • Online Courses
  • In-Person Events
  • Virtual Events
  • Webcasts
  • About Us
  • Contact Us
  • Advertise
  • Subscribe
  • Media Inquiries
  • Newsletter Sign Up
  • Submit Features/News
  • Data Privacy
  • Terms of Use
  • Terms of Sale
  • FAQs
  • Careers at MSI


© 2023 Library Journal. All rights reserved.


© 2022 Library Journal. All rights reserved.