May 18, 2022

Research Article: “What Library Digitization Leaves Out: Predicting the Availability of Digital Surrogates of English Novels” (Preprint)

The following research aticle (preprint) was recently posted on arXiv.


What Library Digitization Leaves Out: Predicting the Availability of Digital Surrogates of English Novels


Allen Riddell
Indiana University Bloomington

Troy J. Bassett
Purdue University Fort Wayne


via arXiv


Library digitization has made more than a hundred thousand 19th-century English-language books available to the public. Do the books which have been digitized reflect the population of published books? An affirmative answer would allow book and literary historians to use holdings of major digital libraries as proxies for the population of published works, sparing them the labor of collecting a representative sample. We address this question by taking advantage of exhaustive bibliographies of novels published for the first time in the British Isles in 1836 and 1838, identifying which of these novels have at least one digital surrogate in the Internet Archive, HathiTrust, Google Books, and the British Library. We find that digital surrogate availability is not random. Certain kinds of novels, notably novels written by men and novels published in multivolume format, have digital surrogates available at distinctly higher rates than other kinds of novels. As the processes leading to this outcome are unlikely to be isolated to the novel and the late 1830s, these findings suggest that similar patterns will likely be observed during adjacent decades and in other genres of publishing (e.g., non-fiction).

Direct to Full Text Article
19 pages; PDF.

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.