May 17, 2022

OCLC Research Publishes Three New Reports on Descriptive Metadata for Web Archiving

UPDATE (March 27-29) Three OCLC Research “Hanging Together” Blog Posts About the Descriptive Metadata for Web Archiving Reports


The three reports linked below were just published by the OCLC Research Library Partnership Web Archiving Metadata Working Group (WAM).


The RLP Web Archiving Metadata Working Group working with Jackie Dooley, RLP Program Officer, has written three publications focused on descriptive metadata for web archiving.

The work arose in part from two recent surveys—one of end users of archived web content and the other of web archiving practitioners—both of which showed that lack of a common approach to creating metadata was the most widely shared challenge across the web archiving community.

In response, OCLC Research established the Web Archiving Metadata Working Group to develop recommendations for descriptive metadata. Their approach is tailored to the unique characteristics of archived websites, with an eye to helping institutions improve the consistency and efficiency of their metadata practices in this emerging area.

The working group recognized the importance of gaining a clear understanding of the needs of users of archived web content, and took this into account throughout the project. The work was done in consultation with the International Internet Preservation Consortium, the Society of American Archivists Web Archiving Section, and the Internet Archive’s Archive-It program, and with much community input and feedback.

Report One


Descriptive Metadata for Web Archiving: Recommendations of the OCLC Research Library Partnership Web Archiving Metadata Working Group


Jackie Dooley and Kate Bowers


WAM-RecommendationsWAM’s overall objective was to develop practices for creating consistent metadata that address the unique characteristics of websites and collections. More specifically:

  • Develop community-neutral, standards-neutral practices for descriptive metadata for archived web content, taking into account the needs of end users and metadata practitioners.
  • Define a lean set of data elements with usage notes to guide the preparation of data content.
  • Ensure that the data elements can be used in concert with other standards that have far more granular data element sets.
  • Provide a bridge between bibliographic and archival approaches to description.
    Use a scalable approach that requires neither in-depth description nor extensive changes to records over time.
  • Enable practitioners to have confidence that they are contributing to the application of consistent practice in this emerging area.

WAM’s recommended practices can be used by any institution or person with a need to describe web content. Some potential use cases:

  • Scholars building personal archives of websites for research purposes
  • Libraries and archives using RDA/MARC that seek specific guidance on the element and content that are most pertinent to description of web content
  • Archives and libraries having a need to map their DACS-based MARC records and/or EAD-encoded finding aids to the more simplified structure of a digital repository or a web tool such as Archive-It
  • Digital repositories encoding metadata for web content in MODS without reference to any content standard
  • Archive-It users seeking guidance on creating content for Dublin Core elements

Direct to Full Text Report (58 pages; PDF)

Report Two

Literature Review of User Needs


Jessica Venlet, Karen Stoll Farrell, Tammi Kim, Allison Jai O’Dell, and Jackie Dooley


WAM-LitReviewThe OCLC Research Library Partnership Web Archiving Metadata Working Group was formed to recommend descriptive metadata best practices for archived web content that would meet end-user needs, enhance discovery and improve metadata consistency. To that end, the group conducted a literature review to inform their development of best practices.

They selected readings that include, at minimum, a substantive section related to metadata, but most covered a wider swath of issues. This helped them learn much else about who the users of web archives are, the strategies they use and the challenges they face.

The literature falls into two clear categories: the needs of end users and the needs of metadata practitioners. This review characterizes types of end users, their research methodologies, barriers to use, discovery interfaces, and the need for support services and outreach. The review of practitioner literatures addresses the need for scalable practices, the standards and shared practices currently in use, the outcomes of a variety of case studies and other approaches to metadata.

Direct to Full Text Report (50 pages; PDF)

Report Three

Review of Harvesting Tools


Mary Samouelian and Jackie Dooley


WAM-ToolsThe OCLC Research Library Partnership Web Archiving Metadata Working Group (WAM) was formed to recommend descriptive metadata best practices for archived web content. When the group began its work early in 2016, we discovered that metadata practitioners had high hopes that it would be possible to extract descriptive metadata from harvested content.

This report offers our objective analysis of 11 tools in pursuit of an answer to that question. We reviewed selected web harvesting tools to determine their descriptive metadata functionalities. The question we sought to answer was this: Can web harvesting tools automatically generate descriptive metadata that supports the discoverability of archived web resources? Auto-generation of descriptive metadata for archived web resources could result in significant gains in the efficiency of data entry and thus help enable metadata production at scale.

Our intent was twofold: 1) provide the web archiving community with a description of each relevant tool’s overall purpose and metadata-related capabilities, and 2) inform WAM’s overarching objective of preparing best practice recommendations for web archiving descriptive metadata based on an understanding of user needs.

Direct to Full Text Report (26pages; PDF)

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.