Crossref Asks, “What Do We Know About DOIs”
From an Interesting and Informative Crossref Blog Post by Martin Eve:
Crossref holds metadata for approximately 150 million scholarly artifacts. These range from peer reviewed journal articles through to scholarly books through to scientific blog posts. In fact, amid such heterogeneity, the only singular factor that unites such items is that they have been assigned a document object identifier (DOI); a unique identification string that can be used to resolve to a resource pertaining to said metadata (often, but not always, a copy of the work identified by the metadata).
What, though, do we actually know about the state of persistence of these links? How many DOIs resolve correctly? How many landing pages, at the other end of the DOI resolution, contain the information that is supposed to be there, including the title and the DOI itself? How can we find out?
[Clip]
Let’s talk about the resolution statistics. Other studies, looking at general links on the web, have found a link-rot rate of about 60%-70% over a ten-year period (Lessig, Zittrain, and Albert 2014; Stox 2022). The DOI resolution rate that we have, with 97% of links resolving (or a 3% link-rot rate), is far better and more robust than a web link in general.
Is 3% a good or a bad number? It’s more robust than the web in general, but it still means that for every 100 DOIs, just under 3 will fail to resolve. We also cannot tell whether these DOIs are resolving to the correct target, except by using the metadata detection metrics (are the title and DOI on the landing page, which we could only detect at a far lower rate). It is entirely possible for a website to resolve with an HTTP 200 (OK) response, but for the page in question to be something very different to what the user expected, a phenomenon dubbed content drift. A good example is domain hijacking, where a domain name expires and spam companies buy them up. These still resolve to a web page, but instead of an article on RNA, for a hypothetical example, the user gets adverts for rubber welding hose. That said, other studies are also prone to this and there is no guarantee that content drift doesn’t affect a huge proportion of supposedly good links in the other studies, too.
Learn MUCH More, Read the Complete Post (Highly Recommended; about 1500 words)
UPDATES (March 4, 2023)
See Also: Millions of Research Papers at Risk of Disappearing From the Internet (via Nature)
See Also: Digital Scholarly Journals Are Poorly Preserved: A Study of 7 Million Articles (via JLSC)
Filed under: Journal Articles, News
About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.