Tech Article: "Toward free and searchable historical census images"

September 23, 2011 by Gary Price

Title: “Toward free and searchable historical census images”

By:
Kenton McHenry, Luigi Marini, Mayank Kejriwal, Rob Kooper
National Center for Supercomputing Applications
University of Illinois at Urbana-Champaign
Urbana, IL

and

Peter Bajcsy

National Institute of Standards and Technology

Gaithersburg, MD

From the SPIE Newsroom:

In summary, our hybrid automation/crowd-sourcing approach aims to provide search capabilities over the image-based census data, potentially from the day the images are released. However, general difficulties in automating handwriting recognition will limit its accuracy. Incorporation of passive and active crowd-sourcing elements will improve the accuracy of our systems over time. We are currently working on a number of challenges, including further pre-processing of form cells to remove noise. Our next important stage will be to build an index of the ∼7 billion form cells, which is crucial for efficient access. However, of the word-spotting techniques we tested, the best results use a non-linear comparison that does not lend itself to indexing. We are currently investigating alternative methods that are indexable, as well as using high-performance computing resources to perform a one-time, large pre-processing step to hierarchically cluster the data (requiring 4.9×10¹⁹ comparisons). Finally, we will investigate how best to associate the passively crowd-sourced transcriptions with the results based on user behavior.

Filed under: Data Files, News

Digitization U.S. Census

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.

Tech Article: "Toward free and searchable historical census images"

About Gary Price

Archives

FOLLOW US ON TWITTER

Tech Article: "Toward free and searchable historical census images"

About Gary Price

Archives

Related Infodocket Posts

FOLLOW US ON TWITTER