SUBSCRIBE
SUBSCRIBE
EXPLORE +
  • About infoDOCKET
  • Academic Libraries on LJ
  • Research on LJ
  • News on LJ
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Libraries
    • Academic Libraries
    • Government Libraries
    • National Libraries
    • Public Libraries
  • Companies (Publishers/Vendors)
    • EBSCO
    • Elsevier
    • Ex Libris
    • Frontiers
    • Gale
    • PLOS
    • Scholastic
  • New Resources
    • Dashboards
    • Data Files
    • Digital Collections
    • Digital Preservation
    • Interactive Tools
    • Maps
    • Other
    • Podcasts
    • Productivity
  • New Research
    • Conference Presentations
    • Journal Articles
    • Lecture
    • New Issue
    • Reports
  • Topics
    • Archives & Special Collections
    • Associations & Organizations
    • Awards
    • Funding
    • Interviews
    • Jobs
    • Management & Leadership
    • News
    • Patrons & Users
    • Preservation
    • Profiles
    • Publishing
    • Roundup
    • Scholarly Communications
      • Open Access

January 9, 2013 by Gary Price

Data Mining: New Research From Google’s Peter Norvig Finds Most-Used English Words and Letters

January 9, 2013 by Gary Price

Update: At the bottom of this post we’ve added a link to the full text research paper that the article below references.
From TPM Idea Land:

“Etaoin srhldcu” may read like nonsense to most English speakers upon first blush, but as it turns out, the combination is quite significant. It represents, in order, the most used letters in the English language, according to a new survey of 743 billion words conducted by Google’s head of research Peter Norvig.
The survey, which was publicized by Google Research on Monday, was an update to the seminal 1965 survey of some 20,000 words gathered from a variety of printed sources — books, magazines, newspapers — conducted by Mark Mayzner, a former Bell Labs researcher.
[Clip]
Using the Google Books Ngram viewer (which shows word popularity over time), Norvig created a new dataset of some 97,565 unique words, collectively repeated 743.8 billion times, which he noted on his blog is 37 million more occurrences than the 20,000-word sample that Mayzner assembled. Norvig’s sample also included over 3 trillion individual letters.

Read the Complete Article

Highlights (via Google Research on Google+)

– R, L, and C are more common than originally thought.
– The average English word is 4.79 letters long.
– The most common 4-gram is “tion”.
– The most common 7-gram is “present”.
– The most common 9-gram is “different”.

Charts

Direct to Chart: Most Used Letters in the English Language
Direct to Chart: Most Frequently Appearing Words in the English Language
Top 10 Words
1. The
2. Of
3. And
4. To
5. In
6. A
7. Is
8. That
9. For
10. It
See Also: Here’s the Full Text of Peter Norvig’s New Research Paper:
“English Letter Frequency Counts: Mayzner Revisited or ETAOIN SRHLDCU”

More background, findings, and charts.

Filed under: Data Files, Journal Articles, News

SHARE:

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.

ADVERTISEMENT

Archives

Job Zone

ADVERTISEMENT

Related Infodocket Posts

Colorado: "Former Weld County Librarian Wins Settlement After District Fired Her For Promoting LGBTQ, Anti-Racism Programs"

From Colorado Public Radio: A former librarian will receive $250,000 from the High Plains Library District as part of a settlement in a lengthy civil rights dispute over her firing. ...

Journal Article: "Video Game Equipment Loss and Durability in a Circulating Academic Collection"

The article linked below was published online today by Evidence Based Library and Information Practice (EBLIP). Title Video Game Equipment Loss and Durability in a Circulating Academic Collection Authors Diane ...

New Report: "Alternative Publishing Platforms. What Have We Learnt?"

From Knowledge Exchange: Different alternative publishing platforms have appeared over recent years. But what are their pros and cons? Do they differ significantly from traditional scholarly journals? To better understand ...

Not Real News: An Associated Press Roundup of Untrue Stories Shared Widely on Social Media This Week

From the Associated Press: A roundup of some of the most popular but completely untrue stories and visuals of the week. None of these are legit, even though they were ...

The Federal Trade Commission (FTC) is Hosting a Virtual Roundtable on AI and Content Creation on October 4th

From the Federal Trade Commission: The Federal Trade Commission staff will be hosting a virtual roundtable discussion on October 4, 2023 to better understand the impact of the use of ...

Andrea Jackson Gavin Appointed Inaugural Program Director of the HBCU Digital Library Trust

Below is the Full Text of the Announcement Letter (via the Harvard Library): We are delighted to announce the appointment of Andrea Jackson Gavin as the inaugural Program Director of the ...

U.S. Census Releases 2020 Data for Nearly 1,500 Detailed Race and Ethnicity Groups, Tribes and Villages

From the U.S. Census: The U.S. Census Bureau today released 2020 Census population counts and sex-by-age statistics for 300 detailed race and ethnic groups, as well as 1,187 detailed American ...

Book Bans Spike by 33% During the Last School Year, According to New Research by PEN America

From PEN America:  The number of public school book bans across the country increased by 33 percent in the 2022-23 school year compared to the 2021-22 school year, according to ...

Penn State Leads Big Ten Academic Alliance Project on Open Homework Systems; ChatGPT Usage is Rising Again as...

AI ChatGPT Usage is Rising Again as Students Return to School (via Bloomberg) Universities Rethink Using AI Writing Detectors to Vet Students’ Work (via Bloomberg) Amazon AI-Generated Books Force Amazon ...

Alabama: Ozark Library Increases Parental Supervision Requirements But Doesn’t Remove Books

From The Alabama Reflector: The Ozark Dale County Library Board of Trustees Wednesday approved new policies that will require more parental supervision of children in libraries but did not explicitly ...

$800,000 Budget Cut Proposed: West Virginia University Library System Plans to Reduce Staff, Modify Space Amid University Cuts;...

From WCHS: Following the vote to cut 28 majors and more than 100 faculty positions at West Virginia University, the university’s library system could be the next to take the ...

American Library Association (ALA) Releases Preliminary Data on 2023 Book Challenges; Highest Number of Book Challenges Since ALA...

UPDATE LeVar Burton to Lead 2023 Banned Books Week as Honorary Chair (via ALA) —End Update— Below is the full text of a statement released today by the American Library ...

ADVERTISEMENT

FOLLOW US ON TWITTER

Tweets by infoDOCKET

ADVERTISEMENT

This coverage is free for all visitors. Your support makes this possible.

This coverage is free for all visitors. Your support makes this possible.

Primary Sidebar

  • News
  • Reviews+
  • Technology
  • Programs+
  • Design
  • Leadership
  • People
  • COVID-19
  • Advocacy
  • Opinion
  • INFOdocket
  • Job Zone

Reviews+

  • Booklists
  • Prepub Alert
  • Book Pulse
  • Media
  • Readers' Advisory
  • Self-Published Books
  • Review Submissions
  • Review for LJ

Awards

  • Library of the Year
  • Librarian of the Year
  • Movers & Shakers 2022
  • Paralibrarian of the Year
  • Best Small Library
  • Marketer of the Year
  • All Awards Guidelines
  • Community Impact Prize

Resources

  • LJ Index/Star Libraries
  • Research
  • White Papers / Case Studies

Events & PD

  • Online Courses
  • In-Person Events
  • Virtual Events
  • Webcasts
  • About Us
  • Contact Us
  • Advertise
  • Subscribe
  • Media Inquiries
  • Newsletter Sign Up
  • Submit Features/News
  • Data Privacy
  • Terms of Use
  • Terms of Sale
  • FAQs
  • Careers at MSI


© 2023 Library Journal. All rights reserved.


© 2022 Library Journal. All rights reserved.