The Public Interest Corpus Releases Principles and Goals
From a Blog Post (via Authors Alliance):
Today, we are pleased to release The Public Interest Corpus Principles and Goals. This release builds on the recap of our final planning workshop and anticipates release of our final deliverable later this month.
[Clip]
The Public Interest Corpus works with a growing coalition of stakeholders to develop a service that advances the library community’s ability to support the responsible use of their collections for AI research and development and computational research more generally. The initial focus of the service is on a corpus development, discovery, and access solution for books data (digitized and/or born digital text with metadata) at scale. Some estimatessuggest that ~162,000,000 books have been created globally, with ~2,200,000 new books published each year. Collectively, libraries steward the most comprehensive source of human inquiry recorded in book form.
[Clip]
What principles guide The Public Interest Corpus?
- The Public Interest Corpus … advances equitable access to books data for small, medium, and large organizations.
- The Public Interest Corpus … supports AI research and development and computational research that addresses public interest challenges (e.g., fighting misinformation, advancing understanding of the past and present, fostering a more informed citizenry).
- The Public Interest Corpus … addresses corpus limitations (e.g., linguistic bias, outmoded forms of knowledge present in the corpus, and data quality) through production of additional metadata in line with efforts like the Hugging Face Model Card and Data Nutrition Label.
- The Public Interest Corpus … commits to transparency with respect to corpus composition, modification, and agreements in order to increase public trust in research that makes use of the corpus.
- The Public Interest Corpus … values the labor of content creators and works to ensure that their work is recognized through promotion of credit and attribution practices.
- The Public Interest Corpus … adopts practices and infrastructure that aim to reduce the environmental impactof corpus development, discovery, and access.
- The Public Interest Corpus … forms partnerships that concretely address long-term collective needs of academic libraries and the communities they serve (e.g., maximizing access, reducing legal encumbrances).
- The Public Interest Corpus … is fundamentally guided by diverse stakeholders including but not limited to researchers, librarians, publishers, authors, and technologists.
What goals should The Public Interest Corpus work to achieve?
- Coordinate books data sourcing, discovery, and access across small, medium, and large organizations.
- Create cost efficiencies in access to books data.
- Minimize legal risk for those that seek to provide or make use of books data.
- Curate and provide access to fit-for-purpose books data that exceeds in quality and comprehensiveness what is otherwise available.
- Ensure consistent corpus growth and refinement over time in alignment with user community needs.
- Identify and adopt scalable author credit and attribution methods for authors and rights holders to track reuse.
- Deliver minimum viable solutions.
- Adopt a fit for purpose governance model.
- Develop a sustainability model that reduces barriers to books data access for small, medium, and large organizations on an ongoing basis.
Direct to Complete Blog Post
Filed under: Academic Libraries, Associations and Organizations, Companies (Publishers/Vendors), Data Files, Libraries, News
About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.


