May 20, 2022

JSTOR Publishes White Paper on Digitization of Arabic-Language Materials


In 2017, JSTOR received a grant from the National Endowment for the Humanities to study processes for digitizing Arabic-language scholarly content. Our goal was to develop a workflow for scanning Arabic journals that is cost-efficient, feasible to implement at scale, and able to produce high-quality images, metadata, and fully searchable text.

In a recently released white paper, “Digitizing printed Arabic journals: is a scalable solution possible?,” ITHAKA’s Anne Ray, Senior Licensing Editor, and John Kiplinger, Director of Production, contextualize JSTOR’s investigation in the broader landscape of digital scholarly journal literature in Arabic, document our approach and findings from this project, and report on some areas for further research. Among its conclusions, the paper establishes that it is possible to digitize Arabic language journals with a high degree of accuracy, and that cost could be reduced through continuous improvements in the optical character recognition software engine. This exploration was conducted in collaboration with the American University in Beirut and the Open Islamicate Text Initiative.

Direct to Full Text White Paper
105 pages; PDF.

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.