SUBSCRIBE
SUBSCRIBE
EXPLORE +
  • About infoDOCKET
  • Academic Libraries on LJ
  • Research on LJ
  • News on LJ
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Libraries
    • Academic Libraries
    • Government Libraries
    • National Libraries
    • Public Libraries
  • Companies (Publishers/Vendors)
    • EBSCO
    • Elsevier
    • Ex Libris
    • Frontiers
    • Gale
    • PLOS
    • Scholastic
  • New Resources
    • Dashboards
    • Data Files
    • Digital Collections
    • Digital Preservation
    • Interactive Tools
    • Maps
    • Other
    • Podcasts
    • Productivity
  • New Research
    • Conference Presentations
    • Journal Articles
    • Lecture
    • New Issue
    • Reports
  • Topics
    • Archives & Special Collections
    • Associations & Organizations
    • Awards
    • Funding
    • Interviews
    • Jobs
    • Management & Leadership
    • News
    • Patrons & Users
    • Preservation
    • Profiles
    • Publishing
    • Roundup
    • Scholarly Communications
      • Open Access

April 19, 2013 by Gary Price

Full Text Paper: “Carbon Dating The Web: Estimating the Age of Web Resources”

April 19, 2013 by Gary Price

The following paper will be presented at the TempWeb 2013 Workshop during the WWW 2013 Conference  that is scheduled to take place next month in in Rio de Janeiro, Brazil.

Title

Carbon Dating The Web: Estimating the Age of Web Resources

Authors

Hany M. SalahEldeen
Old Dominion University 
Michael L. Nelson
Old Dominion University 

Source

via @arXiv

Abstract

In the course of web research it is often necessary to estimate the creation datetime for web resources (in the general case, this value can only be estimated). While it is feasible to manually establish likely datetime values for small numbers of resources, this becomes infeasible if the collection is large. We present “carbon date”, a simple web application that estimates the creation date for a URI by polling a number of sources of evidence and returning a machine-readable structure with their respective values. To establish a likely datetime, we poll bitly for the first time someone shortened the URI, topsy for the first time someone tweeted the URI, a Memento aggregator for the first time it appeared in a public web archive, Google’s time of last crawl, and the Last-Modified HTTP response header of the resource itself. We also examine the backlinks of the URI as reported by Google and apply the same techniques for the resources that link to the URI. We evaluated our tool on a gold-standard data set of 1200 URIs in which the creation date was manually verified. We were able to estimate a creation date for 75.90% of the resources, with 32.78% having the correct value. Given the different nature of the URIs, the union of the various methods produces the best results. While the Google last crawl date and topsy account for nearly 66% of the closest answers, eliminating the web archives or Last-Modified from the results produces the largest overall negative impact on the results. The carbon date application is available for download or use via a webAPI.

Direct to Full Text Paper (14 pages; PDF via arXiv)
Direct to Dataset (CSV via arXiv)

Related Papers by the Same Authors

  • How Much of the Web Is Archived? (10 pages; PDF via arXiv)
  • Losing My Revolution: How Many Resources Shared on Social Media Have Been Lost? (12 pages; PDF  via arXiv)
  • A Plan For Curating “Obsolete Data or Resources” (4 pages; PDF via arXiv)
  • Using Web Page Titles to Rediscover Lost Web Page (pages; PDF via arXiv)

Filed under: Archives and Special Collections, Data Files, Journal Articles, News

SHARE:

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.

ADVERTISEMENT

Archives

Job Zone

ADVERTISEMENT

Related Infodocket Posts

Now Available to All Readers in Illinois: Digital Public Library of America (DPLA) Partners with the University of...

UPDATE: An Announcement From the University of Chicago is Also Available: UChicago Library Expands Access to Banned Books Amid National Debate Over Censorship From a DPLA Announcement: The Digital Public ...

Invitation to Host IFLA WLIC (World Library and Information Congress) 2024 in Dubai Withdrawn

Here’s the Full Text of the IFLA (International Federation of Library Associations and Institutions) Statement Released Today: IFLA has been informed of the decision to withdraw the invitation to hold ...

Report: Exploring the Experiences of Canadians Accessing Alternate Format Print Materials

From Statistics Canada: In Canada, the 6.2 million persons with disabilities often experience challenges related to accessibility in their daily lives. While persons with disabilities face unique experiences and challenges ...

New Report: "Assessment of the Library of Congress's Digital Strategy"

From the Office of the Inspector General, Library of Congress: The Library’s digital planning and execution activities have resulted in numerous accomplishments. Despite these achievements, more work remains to clearly ...

A New Open Science Indicators Dataset From PLOS; Yale University Selects Clarivate to Provide Their Next Library Services...

EDUCAUSE 2023 EDUCAUSE Horizon Report: Holistic Student Experience Edition Open Science The White House Office of Science & Technology Policy Open Science Recognition Challenge (via USGS) Public Library of Science ...

New Bill in New York State Assembly: re: Sale of Books Created with Generative AI

The bill linked below was posted on September 29, 2023. From the Summary: Requires online sellers of books created wholly or partially with the use of generative artificial intelligence to ...

A Banned Books Week Video Message From LeVar Burton re: Let Freedom Read Day on October 7, 2023

From the Video’s Description: Beloved reading advocate, writer, and television and film star LeVar Burton is leading this year’s Banned Books Week, which takes place October 1–7, 2023. He’s ready ...

PEN America, ALA, Children's Book Council, Leading Publishers, Teacher Groups, and Other Organizations Launch Letter-Writing Campaign to Oppose...

Here’s the Full Text of a Release From PEN America: For Banned Books Week 2023, PEN America and We Believe gathered a consortium of the nation’s leading publishers, teacher and ...

Academic Librarian Leans on Internet Archive for Access and Analysis; Op/Ed: CT Community College Libraries are Folding—Students Deserve...

Alabama Huntsville Library Disputes Alabama Political Reporter’s Evidence-Based Story on Relocating LGBTQ Books (via APR) Connecticut Op/Ed: CT Community College Libraries are Folding—Students Deserve Better (via CT Mirror) Dryad Dryad ...

Not Real News: An Associated Press Roundup of Untrue Stories Shared Widely on Social Media This Week

From the Associated Press: A roundup of some of the most popular but completely untrue stories and visuals of the week. None of these are legit, even though they were ...

North Carolina: Charlotte-Mecklenburg Schools Reverses Ban on Banned Books Week Events at Schools

From The Charlotte Observer: Charlotte-Mecklenburg Schools quickly reversed a ban Friday on Banned Books Week events planned in schools. In an email to principals Friday afternoon, a CMS spokeswoman warned ...

South Carolina State Library Leaves American Library Association (ALA), Does Not Renew Membership

From the Charleston City Paper: Librarians are learning the S.C. State Library in August quietly notified the national trade association for libraries that the state was not renewing its membership ...

ADVERTISEMENT

FOLLOW US ON TWITTER

Tweets by infoDOCKET

ADVERTISEMENT

This coverage is free for all visitors. Your support makes this possible.

This coverage is free for all visitors. Your support makes this possible.

Primary Sidebar

  • News
  • Reviews+
  • Technology
  • Programs+
  • Design
  • Leadership
  • People
  • COVID-19
  • Advocacy
  • Opinion
  • INFOdocket
  • Job Zone

Reviews+

  • Booklists
  • Prepub Alert
  • Book Pulse
  • Media
  • Readers' Advisory
  • Self-Published Books
  • Review Submissions
  • Review for LJ

Awards

  • Library of the Year
  • Librarian of the Year
  • Movers & Shakers 2022
  • Paralibrarian of the Year
  • Best Small Library
  • Marketer of the Year
  • All Awards Guidelines
  • Community Impact Prize

Resources

  • LJ Index/Star Libraries
  • Research
  • White Papers / Case Studies

Events & PD

  • Online Courses
  • In-Person Events
  • Virtual Events
  • Webcasts
  • About Us
  • Contact Us
  • Advertise
  • Subscribe
  • Media Inquiries
  • Newsletter Sign Up
  • Submit Features/News
  • Data Privacy
  • Terms of Use
  • Terms of Sale
  • FAQs
  • Careers at MSI


© 2023 Library Journal. All rights reserved.


© 2022 Library Journal. All rights reserved.