May 27, 2022

New Report From Tow Center For Digital Journalism: “A Public Record at Risk: The Dire State of News Archiving in the Digital Age”

“A Public Record at Risk: The Dire State of News Archiving” in the Digital Age was published today by The Tow Center for Digital Journalism at Columbia’s Graduate School of Journalism.

From the Executive Summary:

This research report explores archiving practices and policies across newspapers, magazines, wire services, and digital-only news producers, with the aim of identifying the current state of archiving and potential strategies for preserving content in an age of digital distribution. Between March 2018 and January 2019, we conducted interviews with 48 individuals from 30 news organizations and preservation initiatives.

What we found was that the majority of news outlets had not given any thought to even basic strategies for preserving their digital content, and not one was properly saving a holistic record of what it produces. Of the 21 news organizations in our study, 19 were not taking any protective steps at all to archive their web output. The remaining two lacked formal strategies to ensure that their current practices have the kind of longevity to outlast changes in technology.

Meanwhile, interviewees frequently (and mistakenly) equated digital backup and storage in Google Docs or content management systems as synonymous with archiving. (They are not the same; backup refers to making copies for data recovery in case of damage or loss, while archiving refers to long-term preservation, ensuring that records will still be available even as formatting and distribution technologies change in the future.)

Instead, news organizations have handed over their responsibilities as public stewards to third-party organizations such as the Internet Archive, Google, Ancestry, and ProQuest, which store and distribute copies of news content on remote servers. As such, the news cycle now includes reliance on proprietary organizations with increasing control over the public record. The Internet Archive aside, the larger issue is that these companies’ incentives are neither journalistic nor archival, and may conflict with both.

Key Findings

  • The majority of the news organizations that participated in this research (19 of 21) had no documented policies for the preservation of their content—nor did they have even informal or ad-hoc archival practices in place.
  • In addition to the failure to archive published stories from their own websites, none of the news organizations we interviewed were preserving their social media publications, including tweets and posts to Facebook, Instagram, or any other social media platform. Only one was taking the steps necessary to tackle the problem of archiving interactive and dynamic news applications.
  • Digital-only news organizations had even less awareness than print publications of the importance of preservation. A persistent confusion that backing up work on third-party, cloud servers is the same as archiving means that very little is currently being done to preserve news.
  • When we asked interviewees why they believe news organizations are not archiving content, they said repeatedly that journalism’s primary focus is on “what is new” and “happening now.” Journalists and their news organizations are more interested in preserving documentation of their reporting and what makes it accurate than preserving what ultimately gets published.
  • As a result, platforms and third-party vendors, which increasingly host news content on their closed servers, are in control of the pieces necessary for holistic preservation without the journalistic incentive to enact it.
  • Staff at news organizations often cited relying on the Internet Archive, a nonprofit digital library that maintains hundreds of billions of web captures, to preserve their own publications—even though web archiving has limitations around the formats it can capture and and preserves only a fraction of what is published online.
  • News apps and interactives, in particular, are at high risk of being lost because often the new technologies they are built on become obsolete before anyone thinks to save them. Newsroom developers and emulation-based web archiving tools under development can be valuable allies in preserving these and other resources in jeopardy.
  • There exist a number of other archiving initiatives, by both individuals and nonprofits, from which news managers can learn or enlist services, including PastPages by Ben Welsh, NewsGrabber by Archive Team, and Archive-It by the Internet Archive. According to news organizations, for digital archiving efforts to succeed, the process must be made simple, both in terms of implementation and workflow.
  • Partnerships among archivists, technologists, memory institutions, and news organizations will be vital to establishing best practices and policies that assure future access to digitally distributed news content. Collaboration between all parties should begin with two questions: What should be preserved? Who should preserve it?
  • Creating robust digital archives will mean grappling with tough questions, like how often to capture a copy of an ever-updating home page, if personalized content and newsletters should be preserved, and what to do with reader comments and social media posts.
  • To enact lasting change, it will be key to find opinion leaders in the field to help introduce archiving ideas in a way that makes sense to staff, as well as to those in management positions who must ultimately be convinced of its advantages and compatibility with their priorities.

Direct to Full Text Report

About Gary Price

Gary Price ( is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at, and is currently a contributing editor at Search Engine Land.