The books, printed centuries before Gutenberg mania swept through Europe, are some of the oldest in UC Berkeley’s collections.
In fact, some are among the oldest books, period.
“These are priceless materials,” said Peter Zhou, director of Berkeley’s C. V. Starr East Asian Library, or EAL. “Some of them are the only pieces of that publication in the world — the world has only one copy.”
And soon, these treasures, and more, will be free for anyone in the world to see.Today, the UC Berkeley Library announces a monumental collaboration with Sichuan University, with funding from the Alibaba Foundation. The project aims to digitize most of the pre-1912 Chinese language materials from EAL’s collections, bringing them to life in vivid detail for researchers today and for generations to come.
Source: UC Berkeley Library
While chunks of EAL’s collections have been digitized and made available online over the years, the project with Sichuan University is the first of its kind because of its grand scope. Berkeley’s collection of Chinese volumes is one of the largest among research libraries in North America. Nearly 10,000 titles are from before 1912, and are in line to be digitized.
Under the agreement, Berkeley will digitize half a million pages per year for three years, with the possibility of the project continuing for another three years after that. The digitization work, to be done in-house at Berkeley, will capture images in high resolution, meeting or exceeding current standards for digital scholarship collections and long-term digital preservation. Each digitized treasure will be painstakingly enriched with information, or metadata — for example, when the item originated or other notes that illuminate its history.
The images will be converted to text through a process called optical character recognition, or OCR. OCR opens the door to needle-in-a-haystack keyword searches within an item, and lowers the barrier of access for people with print disabilities. Sichuan University and DAMO Academy, Alibaba’s research institute, have developed a cutting-edge system that harnesses machine learning to convert ancient Chinese characters into machine-readable text. The system is quick and efficient, recognizing characters 30 times as fast as a human can read, with 97.5 percent accuracy.
At Berkeley, the materials will then make their way to the Library’s Digital Collections portal, where they can be examined 24/7, by anyone, from anywhere.
Among the treasures — which include old and rare woodblock editions and manuscripts — are volumes printed from blocks engraved in the Song and Yuan dynasties. According to Zhou, North American libraries hold around 120 titles tracing back to these periods, which saw the birth of large-scale printing over a thousand years ago. Of those titles, Berkeley holds 44, or roughly a third.
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area.
He earned his MLIS degree from Wayne State University in Detroit.
Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.
The article linked to below was recently published by Urban Library Journal. Title Conversations That Matter: Engaging Library Employees in DEI and Cultural Humility Reflection Authors Angel TruesdaleUniversity of North ...
From the Associated Press: A roundup of some of the most popular but completely untrue stories and visuals of the week. None of these are legit, even though they were ...
From The Texas Tribune: Officials in Llano County must return to the public library system books they removed and allow them to be checked out again, a federal judge ruled this week. ...
From the Website (via Society of Scholarly Publishers): The US OSTP’s Nelson Memo, which requires immediate public access to federally funded research papers, sent a shockwave across the scholarly communications ...
From the Library of Congress: To celebrate the start of the 2023 season, the Library is pleased to announce a new digital collection: Early Baseball Publications. The collection, which will grow over ...
Associated Press: Italy: Privacy Watchdog Temporarily Blocks ChatGPT Citing Data Breach ChatGPT Opened a New Era in Search. Microsoft Could Ruin It (via WIRED) General-Purpose Artificial Intelligence (New Briefing Doc ...
From the Ithaka S+R Library Survey by Ioana G. Hulbert Executive Summary The Ithaka S+R Library Survey has examined leadership and strategic perspectives in the field by surveying library deans ...
Here’s the Full Text of the Coko Announcement: Coko is delighted to announce that the Andrew W. Mellon Foundation has awarded a grant of $595,000 over 2 years to support ...
From Nextgov: The new Trustworthy & Responsible Artificial Intelligence Resource Center built by the National Institute of Standards and Technology will now serve as a repository for much of the ...
From a LIBER Announcement: LIBER is pleased to announce that Martine Pronk has been appointed as Interim Executive Director, for the period 22 May-1 September 2023. Martine will take over ...
Here’s the Full Text of an Announcement From Lancaster University: A new project that works to increase access to valuable research is to receive more than £5.8 million [$7.15 Million/USD] ...
Coalition for Advancing Research Assessment (CoARA) Call for Working Group Proposals 2023 (via RDA) Envisioning Together: A Report of Session 803 (SAA 2022) (via DLF) International Coalition of Library Consortia ...