SUBSCRIBE
SUBSCRIBE
EXPLORE +
  • About infoDOCKET
  • Academic Libraries on LJ
  • Research on LJ
  • News on LJ
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Libraries
    • Academic Libraries
    • Government Libraries
    • National Libraries
    • Public Libraries
  • Companies (Publishers/Vendors)
    • EBSCO
    • Elsevier
    • Ex Libris
    • Frontiers
    • Gale
    • PLOS
    • Scholastic
  • New Resources
    • Dashboards
    • Data Files
    • Digital Collections
    • Digital Preservation
    • Interactive Tools
    • Maps
    • Other
    • Podcasts
    • Productivity
  • New Research
    • Conference Presentations
    • Journal Articles
    • Lecture
    • New Issue
    • Reports
  • Topics
    • Archives & Special Collections
    • Associations & Organizations
    • Awards
    • Funding
    • Interviews
    • Jobs
    • Management & Leadership
    • News
    • Patrons & Users
    • Preservation
    • Profiles
    • Publishing
    • Roundup
    • Scholarly Communications
      • Open Access

May 31, 2019 by Gary Price

Web Archiving: “Using Micro-collections in Social Media to Generate Seeds for Web Archive Collections”

May 31, 2019 by Gary Price

The item linked below is an extended version of a paper that will be presented at the ACM/IEEE Joint Conference on Digital Libraries (JCDL 2019) taking place next week in Champaign, IL.

Title

Using Micro-collections in Social Media to Generate Seeds for Web Archive Collections

Authors

Alexander C. Nwala
Old Dominion University

Michele C. Weigle
Old Dominion University

Michael L. Nelson
Old Dominion University

Source 

via arXiv

Abstract

In a Web plagued by disappearing resources, Web archive collections provide a valuable means of preserving Web resources important to the study of past events ranging from elections to disease outbreaks. These archived collections start with seed URIs (Uniform Resource Identifiers) hand-selected by curators. Curators produce high quality seeds by removing non-relevant URIs and adding URIs from credible and authoritative sources, but it is time consuming to collect these seeds. Two main strategies adopted by curators for discovering seeds include scraping Web (e.g., Google) Search Engine Result Pages (SERPs) and social media (e.g., Twitter) SERPs.

In this work, we studied three social media platforms in order to provide insight on the characteristics of seeds generated from different sources.

First, we developed a simple vocabulary for describing social media posts across different platforms.

Second, we introduced a novel source for generating seeds from URIs in the threaded conversations of social media posts created by single or multiple users. Users on social media sites routinely create and share posts about news events consisting of hand-selected URIs of news stories, tweets, videos, etc. In this work, we call these posts micro-collections, and we consider them as an important source for seeds because the effort taken to create micro-collections is an indication of editorial activity, and a demonstration of domain expertise.

Third, we generated 23,112 seed collections with text and hashtag queries from 449,347 social media posts from Reddit, Twitter, and Scoop.it. We collected in total 120,444 URIs from the conventional scraped SERP posts and micro-collections. We characterized the resultant seed collections across multiple dimensions including the distribution of URIs, precision, ages, diversity of webpages, etc. We showed that seeds generated by scraping SERPs had a higher median probability (0.63) of producing relevant URIs than micro-collections (0.5). However, micro-collections were more likely to produce seeds with a higher precision than conventional SERP collections for Twitter collections generated with hashtags. Also, micro-collections were more likely to produce older webpages and more non-HTML documents.

Direct to Full Text
63 pages; PDF.

Filed under: Journal Articles, Libraries, News, Patrons and Users

SHARE:

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com. Gary is also the co-founder of infoDJ an innovation research consultancy supporting corporate product and business model teams with just-in-time fact and insight finding.

ADVERTISEMENT

Archives

Job Zone

ADVERTISEMENT

Related Infodocket Posts

Ithaka S+R Releases "A*CENSUS II: Archives Administrators Survey" Findings

From an Ithaka S+R Blog Post by the Report’s Author, Makala Skinner:  On Tuesday, January 31, we published the A*CENSUS II Archives Administrators Survey findings. The Archives Administrator Survey Report is ...

“Food is a Right: Libraries and Food Justice" (A New White Paper From the Urban Libraries Council)

From the Urban Libraries Council (ULC): The Urban Libraries Council (ULC) announces today the release of its latest white paper, “Food is a Right: Libraries and Food Justice,” which addresses ...

Standards: W3C Re-Launched as a Public-Interest Non-Profit Organization; eLife’s New Model: Open for Submissions; & More News Headlines

Annual Report 2022: Highlights from the Data Curation Network arXiv Announces New Policy on ChatGPT and Similar Tools (via arXiv Blog) COPE in 2023 (via Committee on Publication Ethics) eLife’s ...

Journal Article: "A Free Toolkit to Foster Open Access Agreements"

The article linked to below was today published by Insights. Title A Free Toolkit to Foster Open Access Agreements Authors Alicia Wise Information Power Lorraine Estelle Information Power Source Insights 36 ...

Six Libraries Partner With GPO To Preserve Government Information

From the Government Publishing Office (GPO): Libraries at the University of Montana, the University of Memphis, and the University of Tennessee, Knoxville have signed Memorandum of Agreements with the U.S. ...

Michigan: Grand Rapids Public Library Finds Rare Set of 'Magic Lantern' Slides Showing Early Tuskegee Institute

From Fox 17 (Grand Rapids): The folks over at the Grand Rapids Public Library made a fascinating discovery while digging through their massive archives back in March 2021, and are ...

Journal Article: "Knowledge Work in Platform Fact-Checking Partnerships"

The article linked below was recently published by the International Journal of Communication. Title Knowledge Work in Platform Fact-Checking Partnerships Authors Valérie Bélair-Gagnon University of Minnesota-Twin Cities, USA Rebekah Larsen ...

State Library Looks to Install Book Vending Machines Around North Dakota; A Guide to Communicating With Others: Messaging...

A Guide to Communicating With Others: Messaging Apps (via Privacy International) De Gruyter Acquires Mercury Learning and Information Report by the French Committee for Open Science Working Group on Electronic ...

Just Released: Calculators Now Emulated at The Internet Archive (The Calculator Drawer)

From an Internet Archive Blog Post by Jason Scott: It’s time to add another family of emulated older technology to the Internet Archive. The vast majority of platforms within what ...

Journal Article: "Crossref as a Bibliographic Discovery Tool in the Arts and Humanities"

The article linked below was recently published by Quantitative Science Studies. Title Crossref as a Bibliographic Discovery Tool in the Arts and Humanities Authors Ángel Borrego Universitat de Barcelona, Melcior ...

Montana: ImagineIF Trustees Hold Special Meeting on Library Security Concerns; Pennsylvania: Philly’s Free Library is Making Space for...

Colorado: Suspensions Increase at Pikes Peak Library District Under New Security Protocols (via The Gazette) Montana: ImagineIF Trustees Hold Special Meeting on Library Security Concerns (via Daily Inter Mountain) North ...

Not Real News: An Associated Press Roundup of Untrue Stories Shared Widely on Social Media This Week

From the Associated Press: A roundup of some of the most popular but completely untrue stories and visuals of the week. None of these are legit, even though they were ...

ADVERTISEMENT

FOLLOW US ON TWITTER

Tweets by infoDOCKET

ADVERTISEMENT

This coverage is free for all visitors. Your support makes this possible.

This coverage is free for all visitors. Your support makes this possible.

Primary Sidebar

  • News
  • Reviews+
  • Technology
  • Programs+
  • Design
  • Leadership
  • People
  • COVID-19
  • Advocacy
  • Opinion
  • INFOdocket
  • Job Zone

Reviews+

  • Booklists
  • Prepub Alert
  • Book Pulse
  • Media
  • Readers' Advisory
  • Self-Published Books
  • Review Submissions
  • Review for LJ

Awards

  • Library of the Year
  • Librarian of the Year
  • Movers & Shakers 2022
  • Paralibrarian of the Year
  • Best Small Library
  • Marketer of the Year
  • All Awards Guidelines
  • Community Impact Prize

Resources

  • LJ Index/Star Libraries
  • Research
  • White Papers / Case Studies

Events & PD

  • Online Courses
  • In-Person Events
  • Virtual Events
  • Webcasts
  • About Us
  • Contact Us
  • Advertise
  • Subscribe
  • Media Inquiries
  • Newsletter Sign Up
  • Submit Features/News
  • Data Privacy
  • Terms of Use
  • Terms of Sale
  • FAQs
  • Careers at MSI


© 2023 Library Journal. All rights reserved.


© 2022 Library Journal. All rights reserved.