So Long and Goodbye! Google Is Ending Newspaper Digitization Project
Update (5/22): We’ve Added A Few Pieces of Info From the Letter Google Sent to Google Newspaper Digitization Partners. You’ll Find Them At the Bottom of this Entry.
Google’s goal to digitize all of the world’s newspapers is ending.
Search Engine Land also received this statement from Google:
We work closely with newspaper partners on a number of initiatives, and as part of the Google News Archives digitization program we collaborated to make older newspapers accessible and searchable online. These have included publications like the London Advertiser in 1895, L’Ami du Lecteur at the turn of the century, and the Milwaukee Sentinel from 1910 to 1995.
Users can continue to search digitized newspapers at http://news.google.com/archivesearch, but we don’t plan to introduce any further features or functionality to the Google News Archives and we are no longer accepting new microfilm or digital files for processing.
According to McGee, about 2000 newspapers are currently listed in the Google newspaper directory.
The SEL article also points to this Boston Phoenix story that begins:
Google told partners in its News Archive project that it would cease accepting, scanning, and indexing microfilm and other archival material from newspapers, and was instead focusing its energies on “newer projects that help the industry, such as Google One Pass, a platform that enables publishers to sell content and subscriptions directly from their own sites.”
Some newspapers complained that Google, after quickly scanning their archives, was slow to process the scans. The Phoenix sent Google a stash of archives covering several decades; some fraction of those have made their way online.
It remains to be seen whether Google will complete the process of indexing the newspapers it has scanned*. We’d guess not. Are we mad at that? Ehhh, not really. The deal Google struck with partner newspapers stipulated that, somewhere down the line, a paper could purchase Google’s digital scans of its content for a fee. That fee is now being waived, and Google is not only giving publishers free access to the scanned files, but also the rights to publish them with other partners.
* For the searcher this means that not every issue of every paper Google lists in its directory is available.
Comment: New leadership is in place at Google and new leadership can often bring changes. This is likely one of them. Although the newspaper digitization service is going to shutdown today’s news is an excellent reminder that Google is a money making venture and like any other business they make business decisions that we often love but at other might we might wish things were different. The company has to work with a large number of groups that can have varying interests. They include shareholders, business partners, searchers/users, and others.
1. Google is a company like any other. With all of the useful/good things they do they’re still a for-profit business. It can sometimes be easy to forget this important fact.
2. Google is often and correctly referred to as an advertising or marketing company. This doesn’t mean that they don’t and can’t do good and useful things, that wouldn’t work for anyone. However, as we said a moment ago all companies have to make decisions based on a variety of factors that often, in one way or another, involve money. Around 95% of Google’s revenue comes from advertising/marketing.
3. Specifically, why they ended the digitization program “newer projects that help the industry, such as ‘Google One Pass‘” is worth mentioning (of course there is likely more to it) but why the did it is really not the issue. What is the issue? That Google can end a program, service, or feature if for WHATEVER reason they want to unless it’s stipulated in the contract.
4.. Because Google makes business decisions things can change quickly because business can change quickly. So, it might be useful to be aware of a variety of possible providers of a service or resource or to put it another way be very careful not to put all of your eggs in one basket. Keeping this in mind is what’s important. Of course, if one company was the only possible choice as an info provider that brings up a lot of issues for another time.
See Also: The Chronicling America Newspaper Digitization Project from the Library of Congress and NEH Continues to Digitize U.S. Newspapers from
See Also: A Week Ago We Pointed Out That the Australian Newspaper Digitization Program Had Recently Passed the Five Million Digitized Pages Mark
UPDATE: We We’ve Been Able to Read the Full Text of the Letter Google Sent to Google News Archive Digitization Partners. A Few Pieces of Info from the Letter Follow:
- The Letter Says That Google Digitized 60 Million Newspaper Pages. Caveat (at least for now): As Mentioned Earlier in this Post, According to the Boston Phoenix Not All Digitized Material Has Been Indexed by Google. So, we’re not sure exactly what the 60 Million Number Means. We have contacted Google hoping to learn more.
- The Letter Says: Publishers Want to Sell Content So Google’s Program Not As Appealing as What Others Offer, So “To Keep Up With the Shift…” Google is Focusing Resources on Newer Projects To Help the Industry. Ex. Google OnePass
- Google Suggests ProQuest (a Google Partner) as a Company Publishers Might Want to Consider “To Explore New Opportunities.”
- Publishers Can Request Copies of Files Google Digitized and Can Use Them With No Fees (Google Waived a Per Page Fee as a Way to Express Their Gratitude to Participants) or Limitations
- The Letter Links to a Request Form With Info About What is Sent to Publishers If They Ask for Copies of the the Digital Files. For Each Digitized Page Three Files Will Be Sent:
- Two High-Resolution Image Files (the Original “Raw” image and the “Cleaned” Image With a White Background and Sharper Text)
- One HTML File Containing the OCRd text (File is Suitable for Search But not for Display)
- Google Points Out that One Reel of Microfilm Can Return a File With Up to 20GB of Digitized Material
Filed under: Archives and Special Collections, Companies (Publishers/Vendors), Digital Preservation, Journal Articles, Libraries, Management and Leadership, News, Patrons and Users, Reports, Resources
About Gary Price
Gary Price (firstname.lastname@example.org) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.