January 26, 2022

New Resource From UC Berkeley: California Language Archive Offers Web Accessible Materials

From the UC Berkeley News Service:

As of [yesterday], much of the University of California, Berkeley’s vast language resources is accessible, free of charge, to anyone with Internet access via the new California Language Archive (CLA) website and its catalog of UC Berkeley materials – the largest indigenous language archive at a U.S. university.

The site is filled with downloadable digital content that includes rare audio recordings and written documentation. A few examples include 51 hours of Wintu songs and conversations, the hummingbird fire story recited in the nearly extinct language of Nisenan, and handwritten notes on Chochenyo that are based on linguist and ethnographer J.P. Harrington’s work with the language’s last good speaker.

“This very extensive information is valuable for scholars, and absolutely vital for Native American communities trying to revitalize endangered or no longer spoken languages,” said Andrew Garrett, a UC Berkeley professor specializing in historical linguistics and the driving force behind the CLA.

The campus’s extensive sound recordings and written data on indigenous California languages typically have been available to scholars, Native communities and others – only during regular business hours, and scattered among multiple campus locations.


The archive has a special focus on California, but includes languages all the way from Alaska to South America and from the Pacific Ocean to the Atlantic. It is the online face of a collaboration/unification of two distinct UC Berkeley archives – the Berkeley Language Center (BLC) and the linguistics department’s Survey of California and Other Indian Languages research center, which curates the BLC’s linguistic field recordings.

The new site resolves nagging problems with incompatible catalogs and different content formats that have complicated attempts at coordinated use of the BLC’s nearly 2,000 hours of audio recordings and 8,000 audio clips in about 90 languages dating back to1949, and the Survey’s 60,000 scanned images of manuscripts, notes and lexical “file slips” that can be used to compile a dictionary.

The most important content from the Survey has been digitized, Garrett said, but it will still take a few more years to properly scan and catalog all of the archive’s more than 150 linear feet of written documentation contained in 186 individual collections.


A map interface enables archive visitors to zoom around California looking for materials, and the site provides the precise geographical place where a recording was made. More work is being done to align the archive’s written materials to a location, which can be tricky, as a researcher’s records may reflect numerous sites.

An especially alluring feature for linguists is a genealogical tree that the CLA provides for each language of California and North America.

Read the Complete Overview

Direct to California Language Archive

About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.