The Smithsonian Institution Begins Crowdsourcing the Transcription of Content Found “Inside” Digitized Images
The program formally launched for public participation today after one year in beta with about 1,000 volunteers creating more than 13,000 pages of transcribed material.
Here’s the Full Text of Today’s SI Announcement:
Today the Smithsonian launches its Transcription Center website to the public.
The website is designed to leverage the power of crowds to help the Smithsonian unlock the content inside thousands of digitized images of documents, such as handwritten Civil War journals, personal letters from famous artists, 100-year-old botany specimen labels and examples of early American currency.
The Smithsonian has already produced digital images for millions of objects, specimens and documents in its collection. Many of the digitized documents are handwritten or have text that computers cannot easily decipher. Transcription by humans is the only way to make the text of these items searchable, which will open them up for endless opportunities for research and discovery
“We are thrilled to invite the public to be our partners in the creation of knowledge to help open our resources for professional and casual researchers to make new discoveries,” said Smithsonian Secretary Wayne Clough. “For years, the vast resources of the Smithsonian were powered by the pen; they can now be powered by the pixel.”
The Smithsonian’s collection is so vast that transcribing its content using its own staff could take decades. By harnessing the power of online volunteers that goal can become a reality. During the past year of beta testing with nearly 1,000 volunteers, the Transcription Center completed more than 13,000 pages of transcription. In one instance—transcribing the personal correspondence of members of the Monuments Men held in the Smithsonian’s Archives of American Art collection—49 volunteers finished the 200-page project in just one week. By some estimates, the volunteers are completing in a couple of days what it would take the Smithsonian months to complete without their help. Once a document is done, the work is reviewed by another volunteer before it is certified for accuracy by a Smithsonian expert.
Projects selected for transcription during the beta-test phase were chosen due to high demand from scientists, researchers and enthusiasts for certain items that presented accessibility challenges. For example, the Smithsonian’s National Museum of Natural History has one of the world’s largest bumble bee collections—nearly 45,000 specimens. Information about each bee, such as where it was collected and when it was collected, is extremely valuable to scientists studying the rapid decline of bee populations during the past few decades. The only way to obtain this information before digitization and transcription would be for a scientist to come to the museum and read each tiny, handwritten label (often as small as 3 millimeters by 7 millimeters) and record the information. Now, with the information digitized and transcribed, scientists anywhere in the world can understand more about the population history of the bumble bee and its recent population decline. The bumble bee transcription project is currently one of the highlighted projects on the site.
Curators at the Archives Center at the Smithsonian’s National Museum of American History chose to contribute the diary of Earl Shaffer, the first man to hike the entire length of the Appalachian Trail. Hiking enthusiasts, naturalists and other researchers frequently consult this now fragile document. Once the diary was digitized and uploaded to the Transcription Center, members of the online Reddit community devoted to the trail promoted the project. As a result, all 121 pages were transcribed in two weeks. The diary is now available for download, allowing the public to read, study and search for key words or landmarks and reducing the need for researchers to handle the delicate artifact.
Volunteers can register online today to help the Smithsonian transcribe a variety of projects relating to art, history, culture and science, including:
- For art lovers: Handwritten personal letters of artists from the Archives of American ArtRead and transcribe personal letters from artists such as Mary Cassatt, Grandma Moses and Claes Oldenburg. Transcriptions of these letters will be part of the Archives forthcoming book The Art of Handwriting. In an age of emails, texts and tweets, when handwritten letters have ceased to be a primary mode of person-to-person communication, this book will explore what can be learned from the handwriting of artists.
- For armchair archeologists: Field reports from Langdon WarnerLangdon Warner was an American archeologist and art historian who specialized in East Asian art. He was also one of the Monuments Men who worked to protect monuments and cultural treasures in Japan during World War II. A professor at Harvard and Curator of Oriental Art at Harvard’s Fogg Museum, he is reputed to be one of the models for Steven Spielberg’s Indiana Jones.
- For bird lovers: Observation notebooks of James EikeJames Eike was a Virginia bird watcher who kept impeccably detailed field observations of birds and the weather nearly every day from 1960 to 1983 near his home in Northern Virginia. In addition to being an important resource for ecologists, it also includes tidbits of cultural events from that time, including the 1969 moon landing.
Filed under: Archives and Special Collections, Digital Preservation, News, Reports
About Gary Price
Gary Price (firstname.lastname@example.org) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.