SUBSCRIBE
SUBSCRIBE
EXPLORE +
  • About infoDOCKET
  • Academic Libraries on LJ
  • Research on LJ
  • News on LJ
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Libraries
    • Academic Libraries
    • Government Libraries
    • National Libraries
    • Public Libraries
  • Companies (Publishers/Vendors)
    • EBSCO
    • Elsevier
    • Ex Libris
    • Frontiers
    • Gale
    • PLOS
    • Scholastic
  • New Resources
    • Dashboards
    • Data Files
    • Digital Collections
    • Digital Preservation
    • Interactive Tools
    • Maps
    • Other
    • Podcasts
    • Productivity
  • New Research
    • Conference Presentations
    • Journal Articles
    • Lecture
    • New Issue
    • Reports
  • Topics
    • Archives & Special Collections
    • Associations & Organizations
    • Awards
    • Funding
    • Interviews
    • Jobs
    • Management & Leadership
    • News
    • Patrons & Users
    • Preservation
    • Profiles
    • Publishing
    • Roundup
    • Scholarly Communications
      • Open Access

March 25, 2012 by Gary Price

Topic Modeling: Avalanches of Words, Sifted and Sorted

March 25, 2012 by Gary Price

From The NY Times:

David M. Blei of Princeton University is among those who are teaching computers to sift through the digital pages of books and articles and categorize the contents by subject, even when that subject isn’t stated explicitly.
For decades, of course, librarians and many others have labeled books and documents with keywords. “But human categorization can only go so far,” said Dr. Blei, an associate professor in computer science. “We don’t have the human power to read and tag all this information.”
To cope with the information explosion, Dr. Blei and other researchers write algorithms so that computers can sift through millions of works and find their common themes by sorting related words into categories. It’s a field called probabilistic topic modeling.
[Clip]
The Bookworm-arXiv interface is the latest in a series of tools developed by the Cultural Observatory. Late in 2010, in collaboration with Google, the lab released the Google n-gram viewer, which lets people search for a phrase of up to five words in Google’s database of scanned books and see the frequency of the words over time in a graph, Dr. Aiden said.

Project Discussed:

  • The Cultural Observatory’s Bookworm (Data via arXiv)

Filed under: Data Files, News

SHARE:

About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.

ADVERTISEMENT

Archives

Job Zone

ADVERTISEMENT

Recent Articles on LJ

Capitol Gains: ALA 2022 Preview

Prince George’s County Memorial Library System Targeted by Anti-LGBTQIA+ Vandalism

Tour de France: A Watching, Reading, and Listening Guide | Your Home Librarian

Positioned for Power: Hiring an EDI Officer | Equity

Dartmouth Repatriates Samson Occom Papers to Mohegan Tribe

ADVERTISEMENT

Related Infodocket Posts

State of New York Releases First-Of-Its Kind Statewide Address-Level Broadband Map

From GCN: An address-level, interactive broadband map will help officials in New York explore statewide high-speed internet availability, assess connectivity needs and better allocate state and federal funding. The map ...

Journal Article: "Rarely Analyzed: The Relationship Between Digital and Physical Rare Books Collections"

The article linked below was recently published by Information Technology and Libraries. Title Rarely Analyzed: The Relationship Between Digital and Physical Rare Books Collections Authors Allison McCormack University of Utah ...

Charles Watkinson Takes Office as AUPresses President

From an AUPresses Announcement: Charles Watkinson, director of the University of Michigan Press, has stepped into the presidency of the Association of University Presses. Watkinson, who also serves as associate ...

New Data From Plan S: "Transformative Journals: Analysis of Year 1 (2021)"

From a Plan S Post: The Transformative Journal (TJ) model is one of the strategies cOAlition S endorses to help subscription publishers transition to full and immediate Open Access (OA). ...

New From Delta Think: "Publishers and Market Consolidation–Part 1 of 2"

From a Delta Think Post by Dan Pollock This month we present the first part of some results of big data analysis of the scholarly publishing industry. We look at ...

Committee on Publication Ethics (COPE) and STM Publish Research Report on Paper Mills

From a Committee on Publication Ethics Blog Post: COPE and STM undertook a study with Maverick Publishing Services, using data from publishers, to understand the scale of the problem of ...

5G to Top One Billion Subscriptions in 2022 and 4.4 Billion in 2027 According to Latest Ericsson Mobility...

From Ericsson: North America is forecast to lead the world in 5G subscription penetration in the next five years with nine-of-every-ten subscriptions in the region expected to be 5G in ...

"New Partners, New Projects and a New Nonprofit: RoRI Embarks on Its Next Five Years of Research on...

From the Research on Research Institute (RoRI): Today marks the start of RoRI’s Phase 2. With our international consortium of partners, we’re excited to launch another five years of generating, ...

Journal Article (Preprint): "OpenCitations, An Open E-Infrastructure to Foster Maximum Reuse of Citation Data"

The article linked below (preprint) was recently submitted for publication in the International Journal of Digital Curation (IJDC). Title OpenCitations, An Open E-Infrastructure to Foster Maximum Reuse of Citation Data ...

Canadian Research Knowledge Network (CRKN) Launches Perpetual Access Rights Tracking Project

From the Canadian Research Knowledge Network: CRKN’s Knowledge Base Entitlements Sub-Committee (KBESC) is pleased to announce the launch of the Perpetual Access Rights Tracking Project. This project allows CRKN members ...

New Data: "Nature Index Annual Tables 2022: China's Research Spending Pays Off"

From a Nature Article: Chinese research outputs enjoyed something of a boom last year. The performance of Jiangsu University is a good example. Its campus sits on the banks of ...

Brooks Rainwater is the New President and CEO of the Urban Libraries Council

Here’s the Full Text of an Announcement by ULC Board Chair, Mary J. Wardell-Ghirarduzzi: On behalf of the ULC Executive Board, I am very pleased to announce Mr. Brooks Rainwater ...

ADVERTISEMENT

FOLLOW INFODOCKET ON TWITTER

Tweets by @infodocket

ADVERTISEMENT

This coverage is free for all visitors. Your support makes this possible.

This coverage is free for all visitors. Your support makes this possible.

Primary Sidebar

  • News
  • Reviews+
  • Technology
  • Programs+
  • Design
  • Leadership
  • People
  • COVID-19
  • Advocacy
  • Opinion
  • INFOdocket
  • Job Zone

Reviews+

  • Booklists
  • Prepub Alert
  • Book Pulse
  • Media
  • Readers' Advisory
  • Self-Published Books
  • Review Submissions
  • Review for LJ

Awards

  • Library of the Year
  • Librarian of the Year
  • Movers & Shakers 2022
  • Paralibrarian of the Year
  • Best Small Library
  • Marketer of the Year
  • All Awards Guidelines
  • Community Impact Prize

Resources

  • LJ Index/Star Libraries
  • Research
  • White Papers / Case Studies

Events & PD

  • Online Courses
  • In-Person Events
  • Virtual Events
  • Webcasts
  • About Us
  • Contact Us
  • Advertise
  • Subscribe
  • Media Inquiries
  • Newsletter Sign Up
  • Submit Features/News
  • Data Privacy
  • Terms of Use
  • Terms of Sale
  • FAQs
  • Careers at MSI


© 2022 Library Journal. All rights reserved.


© 2022 Library Journal. All rights reserved.