More than 50,000 government webpages have been affected, according to an evolving snapshot by the End of Term Archive, a nonpartisan project that began in 2008 to record government websites at the end of each presidential term, and the Internet Archive, a nonprofit digital library formed nearly 30 years ago in San Francisco.

“The number is probably quite larger than that,” said Mark Graham, director of the Internet Archive’s Wayback Machine, where copies of the pages — more than 2 petabytes of material — are being safehoused. “We really weren’t set up to do this level of in-depth analysis. … We’re getting notifications … on almost a daily basis.”

[Clip]

“If we don’t have the information of democracy, then where does that put your democracy?” posed James R. Jacobs, a longtime Stanford University librarian and early End of Term member.

[Clip]

EDGI cofounder Gretchen Gehrke, who leads the research collaborative’s monitoring program of roughly 6,000 federal URLs, said that the courts and Congress stopped Trump 1.0 from permanently deleting environmental data. While data preservationists say that most of what’s been taken down in the past few months can be restored, Trump faces less institutional resistance: the Supreme Court and Republican-controlled Congress are either hesitant or unwilling to check the executive branch, and Trump’s inner circle is moving faster and farther to scrub all kinds of research.

[Clip]

Pace is one reason University of Pennsylvania data librarian Lynda Kellam co-created the Data Rescue Project in February. The project started as a widely disseminated Google doc that a Ukrainian cultural preservation initiative helped turn into a website; it now coordinates an alphabet soup of data groups, including EDGI. The collective has fully or partially cloned more than 900 endangered or removed government URLs, mostly focused on the social sciences.