A team of volunteers from Accenture has built an artificial intelligence (AI)-based solution that helps extract information on victims of Nazi persecution from documents in the Arolsen Archives 40 times faster than previous efforts.
The Arolsen Archives preserve the world’s largest collection of documents on Nazi persecution — 110 million documents and digital objects, a portion of which are part of UNESCO’s Memory of the World program — to keep the memory of the crimes of the German terror regime alive. An essential part of the Archives’ work is to make these documents accessible to all who wish to search for traces of Holocaust victims and survivors, persecution of minorities and forced labor.
Every document maintained in the archives needs to be reviewed and its information (e.g., the family name and birth date on a prisoner registration form) put into a database. To facilitate this process, the Arolsen Archives established “#everynamecounts,” a crowdsourcing project for volunteers to extract information from documents manually.
Translating, reading, transcribing, cataloging and validating these documents by hand could take decades. Each document is indexed independently by three volunteers and, if the entries don’t match, reviewed for accuracy by an Arolsen Archives employee. In effect, it can take up to four people to index and validate four documents in one hour.
Even though the AI does the heavy lifting, human oversight of the process remains important not just to ensure accuracy but also to keep the AI solution learning. By reviewing and correcting information, volunteers “teach” the solution to recognize handwriting characters and abbreviations that were typical for the time. Thanks to their inputs, the AI has gradually improved its precision by 10% within the form field of “mother’s last name.” For the “religion” field, the AI is now operating at 99% confidence.
Since Accenture implemented the AI solution in December 2021, the solution has indexed more than 160,000 names of Nazi persecution victims, extracted information from more than 18,000 documents, and clustered more than 60,000 documents into similar groups to improve identification and analysis