SUBSCRIBE
SUBSCRIBE
EXPLORE +
  • About infoDOCKET
  • Academic Libraries on LJ
  • Research on LJ
  • News on LJ
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Libraries
    • Academic Libraries
    • Government Libraries
    • National Libraries
    • Public Libraries
  • Companies (Publishers/Vendors)
    • EBSCO
    • Elsevier
    • Ex Libris
    • Frontiers
    • Gale
    • PLOS
    • Scholastic
  • New Resources
    • Dashboards
    • Data Files
    • Digital Collections
    • Digital Preservation
    • Interactive Tools
    • Maps
    • Other
    • Podcasts
    • Productivity
  • New Research
    • Conference Presentations
    • Journal Articles
    • Lecture
    • New Issue
    • Reports
  • Topics
    • Archives & Special Collections
    • Associations & Organizations
    • Awards
    • Funding
    • Interviews
    • Jobs
    • Management & Leadership
    • News
    • Patrons & Users
    • Preservation
    • Profiles
    • Publishing
    • Roundup
    • Scholarly Communications
      • Open Access

November 20, 2017 by Gary Price

New Article: Web Robot Detection in Academic Publishing

November 20, 2017 by Gary Price

The following paper (preprint) was recently made available on arXiv.
Title
Web Robot Detection in Academic Publishing
Authors
Athanasios Lagopoulos
Aristotle University of ŒŒThessaloniki
Grigorios Tsoumakas
Aristotle University of ŒŒThessaloniki
Georgios Papadopoulos
Atypon Systems
Source
via arXiv
Abstract

Recent industry reports assure the rise of web robots which comprise more than half of the total web traffic. They not only threaten the security, privacy and efficiency of the web but they also distort analytics and metrics, doubting the veracity of the information being promoted. In the academic publishing domain, this can cause articles to be faulty presented as prominent and influential. In this paper, we present our approach on detecting web robots in academic publishing websites. We use different supervised learning algorithms with a variety of characteristics deriving from both the log files of the server and the content served by the website. Our approach relies on the assumption that human users will be interested in specific domains or articles, while web robots crawl a web library incoherently. We experiment with features adopted in previous studies with the addition of novel semantic characteristics which derive after performing a semantic analysis using the Latent Dirichlet Allocation (LDA) algorithm. Our real-world case study shows promising results, pinpointing the significance of semantic features in the web robot detection problem.

Direct to Full Text
7 pages; PDF.

Filed under: Journal Articles, Libraries, News, Patrons and Users, Publishing, Reports

SHARE:

About Gary Price

Gary Price (gprice@mediasourceinc.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. Before launching INFOdocket, Price and Shirl Kennedy were the founders and senior editors at ResourceShelf and DocuTicker for 10 years. From 2006-2009 he was Director of Online Information Services at Ask.com, and is currently a contributing editor at Search Engine Land.

ADVERTISEMENT

Archives

Job Zone

ADVERTISEMENT

Recent Articles on LJ

Texas A&M Restructures Library Roles, Rescinds Librarian Tenure

Prince George’s County Memorial Library System Targeted by Anti-LGBTQIA+ Vandalism

Proud Boys Disrupt Drag Queen Story Time at San Lorenzo Library

Certified Sustainable | Sustainability

Tour de France: A Watching, Reading, and Listening Guide | Your Home Librarian

ADVERTISEMENT

Related Infodocket Posts

Not Real News: An Associated Press Roundup of Untrue Stories Shared Widely on Social Media This Week

From the Associated Press: A roundup of some of the most popular but completely untrue stories and visuals of the week. None of these are legit, even though they were ...

Journal Article: "Public Libraries as Community Health Partners"

The article linked below was recently published by Certified Public Manager® Applied Research Title Public Libraries as Community Health Partners Authors Melinda HodgesLibrary Director, Buda Public Library, City of Buda ...

University of Cincinnati: Lori Harris Named Interim Dean and University Librarian

From the University of Cincinnati: Lori E. Harris has been named interim dean and university librarian of the University of Cincinnati Libraries effective July 1, 2022. Harris initially joined the ...

New Research Article: "How Policies Portray Students: A Discourse Analysis of Codes of Conduct in Academic Libraries"

The article linked below appears in the July 2022 issue of College & Research Libraries. Title How Policies Portray Students: A Discourse Analysis of Codes of Conduct in Academic Libraries ...

Florida International University (FIU) Launches Open-Access Forensic Research Library

From FIU Announcement: Florida International University (FIU) has launched a first-of-its-kind resource for forensic science practitioners, students, researchers, and the general public. The Research Forensic Library provides access to thousands ...

"Book Banning and the First Amendment" (Video Recording of a Freedom Forum Event)

The video recording of Freedom Forum event about about book banning and the First Amendment was recorded at the Library of Congress on June 21, 2022. The discussion features: Suzanne ...

Educopia Partnering with Curtin University and OAPEN to Create a Community Governed OA Book Analytics Service for Publishers

From an Educopia Announcement: With more than AUD $1M in support from the Mellon Foundation, we at Educopia are excited to be working with collaborative partners at Curtin University  and ...

Just Released: U.S. Copyright Office Publishes Report on Copyright Protections For Press Publishers

From the U.S. Copyright Office: On June 30, 2022, the U.S. Copyright Office published a report titled Copyright Protections for Press Publishers. At the request of Senators Leahy, Tillis, Cornyn, ...

U.S. Census Releases 2021 Population Estimates: Nation Continues to Age as It Becomes More Diverse

From the U.S. Census: The last two decades have seen the country grow continuously older. Since 2000, the national median age – the point at which one-half the population is ...

Vision 2030: Library and Archives Canada Releases New Strategic Plan

From the LAC Website: Following two years of consultations, reflection and work, Library and Archives Canada (LAC) has unveiled its strategic plan, Vision 2030, defining the institution’s goals until 2030 ...

Journal Article: "The Financial Maintenance of Social Science Data Archives: Four Case Studies of Long-Term Infrastructure Work"

The article linked below was recently published by the Journal of the Association for Information Science and Technology (JASIST). Title The Financial Maintenance of Social Science Data Archives: Four Case ...

Library Futures Releases Policy Statement and Draft eBook Legislative Language: Mitigating the Library eBook Conundrum Through Legislative Action...

From a Library Futures Post by Kyle Courtney and Juliya Ziskina: Library Futures is excited to announce that we are launching our policy statement on eBooks. Current eBook licensing practices ...

ADVERTISEMENT

FOLLOW INFODOCKET ON TWITTER

Tweets by @infodocket

ADVERTISEMENT

This coverage is free for all visitors. Your support makes this possible.

This coverage is free for all visitors. Your support makes this possible.

Primary Sidebar

  • News
  • Reviews+
  • Technology
  • Programs+
  • Design
  • Leadership
  • People
  • COVID-19
  • Advocacy
  • Opinion
  • INFOdocket
  • Job Zone

Reviews+

  • Booklists
  • Prepub Alert
  • Book Pulse
  • Media
  • Readers' Advisory
  • Self-Published Books
  • Review Submissions
  • Review for LJ

Awards

  • Library of the Year
  • Librarian of the Year
  • Movers & Shakers 2022
  • Paralibrarian of the Year
  • Best Small Library
  • Marketer of the Year
  • All Awards Guidelines
  • Community Impact Prize

Resources

  • LJ Index/Star Libraries
  • Research
  • White Papers / Case Studies

Events & PD

  • Online Courses
  • In-Person Events
  • Virtual Events
  • Webcasts
  • About Us
  • Contact Us
  • Advertise
  • Subscribe
  • Media Inquiries
  • Newsletter Sign Up
  • Submit Features/News
  • Data Privacy
  • Terms of Use
  • Terms of Sale
  • FAQs
  • Careers at MSI


© 2022 Library Journal. All rights reserved.


© 2022 Library Journal. All rights reserved.