New Resource: ProQuest, University of Michigan Library and Bodleian Libraries Provide 25,000 Early Modern Books as Open Access Text
In one word, WOW! Note that the entire collection can be downloaded for analysis.
From Today’s Announcement:
The texts of the first printed editions of Shakespeare, Chaucer, and Milton as well as lesser-known titles from the early modern era can now be freely read by anyone with an Internet connection. The University of Michigan Library, the University of Oxford’s Bodleian Libraries and ProQuest have made public more than 25,000 manually transcribed texts from the first 200 years of the printed book (1473-1700). These texts represent a significant portion of the estimated total output of English-language work published during the first two centuries of printing in England.
The release (via Creative Commons Public Domain Dedication) marks the completion of the first phase in the Early English Books Online-Text Creation Partnership (EEBO-TCP). An anticipated 40,000 additional texts are planned for release into the public domain by the end of the decade.
Full-text public access to the transcribed EEBO-TCP texts is hosted by the U-M Library at quod.lib.umich.edu/e/eebogroup.
The Bodleian offers individual text downloads in several formats, including ePUB files.
What’s Available?
The EEBO-TCP texts were transcribed from ProQuest’s Early English Books Online (EEBO), a subscriber database of facsimile images obtained from books in libraries all over the world, including the British Library, the Folger Shakespeare Library, and the Bodleian Library at Oxford. Among them are some of the first books printed in English, a body of work that includes early English literature as well as works of history, philosophy, politics, religion, music, mathematics, and science.
Highlights include several of William Caxton’s editions of the works of Chaucer, the first translations of Homer by the Elizabethan dramatist and classical scholar George Chapman, and Sir Isaac Newton’s Philosophiae naturalis principia mathematica. Possibly of even greater value are the thousands of less famous texts which offer unexplored avenues for discovery. Gardening manuals, cookery books, ballads, auction catalogues, dance instructions, and religious tracts detail the commonplace of the early modern period; books about witchcraft and sword fighting document its more exotic facets.
Many of these works have never before been available to the public online, and physical copies are rare and require special handling. The transcribed texts, as open data, are freely available for anyone to read, reuse, reproduce, repurpose and distribute. (ProQuest’s EEBO image database remains available only to subscribers.)
The Partnership That Made It Possible
At its inception in 1999, the aim of EEBO-TCP was to convert the extraordinary corpus EEBO represents into fully searchable digital texts. For modern printed works, such conversions rely upon optical character recognition (OCR), which can automatically produce searchable text from scanned images. But these first printed works use character sets and spelling that aren’t OCR-friendly. Age and print quality present additional hurdles to machine readability.
The conversion of EEBO texts requires painstaking manual labor keyboarding the texts, including Extensible Markup Language to encode the structure of the text (chapter divisions, tables, lists, etc.), and a thorough editorial process to ensure accuracy. To get it done required a transnational collaborative enterprise driven by the U-M Library; the Bodleian Digital Library at the University of Oxford; ProQuest; the Council on Library and Information Resources (CLIR); Jisc, the charity that provides digital solutions to UK education and research; and the support of more than 160 partner libraries.
EEBO-TCP has already provided key source material for scholars with institutional access, and has contributed to monographs, articles, essay collections, and scholarly editions as well as computer-aided linguistics. Our emphasis: The release into the public domain creates new opportunities for research around the globe, and for corpus-based textual analysis (the entire body of work can be downloaded by anyone via box.com.)
“The open access release of the first group of EEBO-TCP texts marks an important milestone in an extraordinary international partnership between public and private entities,” notes Charles Watkinson, Associate University Librarian for Publishing at the University of Michigan. “The opportunity now exists for scholars both within and outside the academy to apply powerful digital scholarship tools to a huge body of material that is of central importance to world culture. The University of Michigan Library is proud to continue to support this landmark project.”
Direct to Early English Books Online (via UMich)
Direct to Bodleian Library to Download Material
Direct to Box.com to Download Complete Corpus
A news release from ProQuest is also available here.
Learn More About the Collection (Provided by ProQuest)
Filed under: Academic Libraries, Data Files, Digital Collections, Interactive Tools, Libraries, News, Open Access, Publishing, Resources

About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.