NARA: 1950 Census Release Will Offer Enhanced Digital Access, Public Collaboration Opportunity
From the National Archives and Records Administration:
With the scheduled April 1, 2022, release of 1950 Census records a little more than three months away, the National Archives is completing efforts to digitize those records and using technology to make them more accessible than ever.
[Clip]
The new website will include a name search function powered by an Artificial Intelligence/Machine Learning (AI/ML) and Optical Character Recognition (OCR) technology tool. This is important for genealogists and other researchers who rely on census records for new information about the nation’s past.
“The OCR being used to transcribe the handwritten names from the census rolls is about as good as the human eye,” said Project Management Director Rodney Payne. “Some of the pages are legible, and others are difficult to decipher. So, the National Archives developed a transcription tool to enable users to submit name updates. This will allow other users to find specific names more easily, and it provides an opportunity for the public to help the agency share these records with the world.”
National Archives officials are encouraging interested members of the public to use the transcription tool and assist the agency to make the records as accurate as possible.
[Clip]
The National Archives is also working to provide bulk download access of the full 1950 Census dataset on launch day. This will be of interest to digital humanists, web developers, social scientists, and anyone wanting to explore aggregations of the records. Other organizations and companies will be able to use this functionality to provide 1950 Census data on their own websites.
When made available on the Amazon Web Services Registry of Open Data, the 1950 Census dataset—over 165 terabytes of data—will include the metadata index, the population schedules, the enumeration district maps, and the enumeration district descriptions for the 1950 Census records. This is approximately 10 times the size of the 1940 Census dataset.
Included in the dataset are approximately:
- 6.5 million digital TIFF images and corresponding JPEG derivative images of the microfilmed “1950 Census of Population and Housing” forms for U.S. states and territories
- 33,215 TIFF images and corresponding JPEG derivative images of the original paper “1950 Census of Population and Housing: Indian Reservation Schedule” forms
- 9,600 digitized images of the 1950 Census Enumeration District Maps, which are annotated maps of counties, cities, and other minor civil divisions that show enumeration districts, census tract, and related boundaries and numbers used for each census
- 63,000 digitized images of the 1950 Census Enumeration District Descriptions, which are written descriptions of geographic areas included within enumeration districts
- 232,000 1950 Census Enumeration District Descriptions, which were produced by generating OCR output of the Enumeration District Description images. More than 25 NARA staff reviewed and cleaned up the OCR output.
Learn More, Read the Complete Announcement
Filed under: Archives and Special Collections, Associations and Organizations, Data Files, Journal Articles, Management and Leadership, Maps, News, Patrons and Users
About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.