May 25, 2022

Mass Digitization: Aptara Digitizing 730,000 Pages of Content for SAGE’s New eBook Platform"

From a News Release:

Aptara announces the award of a 730,000 page digitization project from SAGE. Within 8 months, Aptara will transform a significant collection of legacy print titles into XML format for posting to SAGE’s eBook platform.

Converting to the flexible digital format of XML gives SAGE the ability to easily repurpose their content at any time, for any other delivery platform or device. Working from a mix of print and PDF source files, Aptara is using a customer-defined DTD spec for coding and generating the XML. All print files must first be converted to a digital format using a double-key and compare methodology in combination with OCR (optical character recognition) to ensure an exact replica.

John Shaw, Executive Director of Publishing Technologies at SAGE, adds that Aptara is currently digitizing more than 3000 pages a day.

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.