From an Internet Archive Post :
The Andrew W. Mellon Foundation has awarded a research and development grant to the Internet Archive to address the critical need to preserve the “long tail” of open access scholarly communications. The project, Ensuring the Persistent Access of Long Tail Open Access Journal Literature, builds on prototype work identifying at-risk content held in web archives by using data provided by identifier services and registries. Furthermore, the project expands on work acquiring missing open access articles via customized web harvesting, improving discovery and access to this materials from within extant web archives, and developing machine learning approaches, training sets, and cost models for advancing and scaling this project’s work.
The project will explore how adding automation to the already highly automated systems for archiving the web at scale can help address the need to preserve at-risk open access scholarly outputs. Instead of specialized curation and ingest systems, the project will work to identify the scholarly content already collected in general web collections, both those of the Internet Archive and collaborating partners, and implement automated systems to ensure at-risk scholarly outputs on the web are well-collected and are associated with the appropriate metadata.
The proposal envisages two opposite but complementary approaches:
- A top-down approach involves taking journal metadata and open data sets from identifier and registry sources such as ISSN, DOAJ, Unpaywall, CrossRef, and others and examining the content of large-scale web archives to ask “is this journal being collected and preserved and, if not, how can collection be improved?”
- A bottom-up approach involves examining the content of general domain-scale and global-scale web archives to ask “is this content a journal and, if so, can it be associated with external identifier and metadata sources for enhanced discovery and access?”
Direct to Complete Blog Post For More Details