SUBSCRIBE
SUBSCRIBE
EXPLORE +
  • About infoDOCKET
  • Academic Libraries on LJ
  • Research on LJ
  • News on LJ
  • Skip to primary navigation
  • Skip to main content
  • Skip to primary sidebar
  • Libraries
    • Academic Libraries
    • Government Libraries
    • National Libraries
    • Public Libraries
  • Companies (Publishers/Vendors)
    • EBSCO
    • Elsevier
    • Ex Libris
    • Frontiers
    • Gale
    • PLOS
    • Scholastic
  • New Resources
    • Dashboards
    • Data Files
    • Digital Collections
    • Digital Preservation
    • Interactive Tools
    • Maps
    • Other
    • Podcasts
    • Productivity
  • New Research
    • Conference Presentations
    • Journal Articles
    • Lecture
    • New Issue
    • Reports
  • Topics
    • Archives & Special Collections
    • Associations & Organizations
    • Awards
    • Funding
    • Interviews
    • Jobs
    • Management & Leadership
    • News
    • Patrons & Users
    • Preservation
    • Profiles
    • Publishing
    • Roundup
    • Scholarly Communications
      • Open Access

May 23, 2016 by Gary Price

New Research on Privacy: “Online Tracking: A 1-Million-Site Measurement and Analysis”

May 23, 2016 by Gary Price

From researchers at Princeton University, a new research paper (draft) titled, “Online tracking: A 1-million-site measurement and analysis” by Steven Englehardt and Arvind Narayanan
From the Abstract

We present the largest and most detailed measurement of online tracking conducted to date, based on a crawl of the top 1 million websites. We make 15 types of measurements on each site, including stateful (cookie-based) and stateless (fingerprinting-based) tracking, the effect of browser privacy tools, and the exchange of tracking data between different sites (“cookie syncing”). Our findings include multiple sophisticated fingerprinting techniques never before measured in the wild.
This measurement is made possible by our web privacy measurement tool, OpenWPM, which uses an automated version of a full-fledged consumer browser. It supports parallelism for speed and scale, automatic recovery from failures of the underlying browser, and comprehensive browser instrumentation.
OpenWPM is open-source and has already been used as the basis of seven published studies on web privacy and security.

Direct to Full Text Article (24 pages; PDF)
The research paper is the latest from the Pricenton Web Census.
Some Key Findings

The total number of third parties present on at least two first parties is over 81,000, but the prevalence quickly drops off. Only 123 of these 81,000 are present on more than 1% of sites. This suggests that the number of third parties that a regular user will encounter on a daily basis is relatively small. The effect is accentuated when we consider that different third parties may be owned by the same entity. All of the top 5 third parties, as well as 12 of the top 20, are Google-owned domains. In fact, Google, Facebook, and Twitter are the only third-party entities present on more than 10% of sites.

On HTTPS

Third parties are a major roadblock to HTTPS adoption; insecure third-party resources loaded on secure sites (i.e. mixed content on HTTPS sites) will either be blocked or cause the browser to display security warnings. We find that a large number of third parties (54%) are only ever loaded over HTTP. A significant fraction of HTTP-default sites (26%) embed resources from at least one of the HTTP-only third parties on their homepage. These sites would be unable to upgrade to HTTPS without browsers displaying mixed content errors to their users, the majority of which (92%) would contain active content which would be blocked.
Around 78,000 first-party sites currently support HTTPS by default on their home pages. Nearly of these 8% load with mixed content warnings, of which 12% are caused by third-party trackers.

Top Category of Sites For Tracking

News sites have the most trackers.

Tracking Protection

Firefox’s third-party cookie blocking is very effective, only 237 sites (0.4%) have any third-party cookies set from a domain other than the landing page of the site. Most of these are for benign reasons, such as redirecting to the U.S. version of a non-U.S. site. We did find a handful of exceptions, including 32 that contained ID cookies. These sites appeared to be deliberately redirecting the landing page to a separate domain before redirecting back to the initial domain. Ghostery was effective at reducing both the number of third parties and ID cookies. The average number of third-party includes went down from 17.7 to 3.3, of which just 0.3 had third-party cookies (0.1 with IDs).
Direct to complete summary including info and findings re: device fingerprinting (which is of growing concern) . It also includes links to previously published papers from the Princeton Web Census as well as data sets and software.

Direct to Full Text Article (Draft): “Online tracking: A 1-million-site measurement and analysis.”
UPDATE: Eric Hellman has just posted findings of some tracking research he just completed of ARL member library web sites. Must read!
Hellman’s post is titled, “97% of Research Library Searches Leak Privacy… and Other Disappointing Statistics”
Note From Gary Price, infoDOCKET Founder and Editor:
See Also: I discussed some of these issues in a video interview from the 2015 Charleston Conference.
See Also: I was part of a panel that included Peter Brantley, Marshall Breeding, and Eric Hellman at the Fall 2014 CNI Meeting where these issues were discussed.  Video and slides here.
One final thought, if these issues on digital privacy and library privacy are concerns of yours (personally, professionally or both) take some time to learn what you can do to reduce the amount of data you and your work online leaks. This can be done while at the same time supporting efforts to make changes for all.
Example?
Sure, here’s one. Make sure you and your users understand that when they borrow and ebook from OverDrive and place it on their Kindle.
Amazon retains the borrow record and any notes made in the book UNLESS the user erases it.
This is hardly a new issue and it could be changed quickly with a disclaimer and a link about how to remove the data manually. Doing this is easy and fast.
Here’s something I wrote on this topic about three years ago.
The library community asks for transparency from others but we could do a better job of this ourselves.
Example 2: Instead of using Google Analytics consider using Piwik, a free, open source analytics package.

Filed under: Academic Libraries, Associations and Organizations, Data Files, Interviews, Journal Articles, Libraries, New Issue, News, Patrons and Users, Profiles, Resources

SHARE:

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.

ADVERTISEMENT

Archives

Job Zone

ADVERTISEMENT

Related Infodocket Posts

New Report: "Alternative Publishing Platforms. What Have We Learnt?"

From Knowledge Exchange: Different alternative publishing platforms have appeared over recent years. But what are their pros and cons? Do they differ significantly from traditional scholarly journals? To better understand ...

Not Real News: An Associated Press Roundup of Untrue Stories Shared Widely on Social Media This Week

From the Associated Press: A roundup of some of the most popular but completely untrue stories and visuals of the week. None of these are legit, even though they were ...

The Federal Trade Commission (FTC) is Hosting a Virtual Roundtable on AI and Content Creation on October 4th

From the Federal Trade Commission: The Federal Trade Commission staff will be hosting a virtual roundtable discussion on October 4, 2023 to better understand the impact of the use of ...

Andrea Jackson Gavin Appointed Inaugural Program Director of the HBCU Digital Library Trust

Below is the Full Text of the Announcement Letter (via the Harvard Library): We are delighted to announce the appointment of Andrea Jackson Gavin as the inaugural Program Director of the ...

U.S. Census Releases 2020 Data for Nearly 1,500 Detailed Race and Ethnicity Groups, Tribes and Villages

From the U.S. Census: The U.S. Census Bureau today released 2020 Census population counts and sex-by-age statistics for 300 detailed race and ethnic groups, as well as 1,187 detailed American ...

Book Bans Spike by 33% During the Last School Year, According to New Research by PEN America

From PEN America:  The number of public school book bans across the country increased by 33 percent in the 2022-23 school year compared to the 2021-22 school year, according to ...

Penn State Leads Big Ten Academic Alliance Project on Open Homework Systems; ChatGPT Usage is Rising Again as...

AI ChatGPT Usage is Rising Again as Students Return to School (via Bloomberg) Universities Rethink Using AI Writing Detectors to Vet Students’ Work (via Bloomberg) Amazon AI-Generated Books Force Amazon ...

$800,000 Budget Cut Proposed: West Virginia University Library System Plans to Reduce Staff, Modify Space Amid University Cuts;...

From WCHS: Following the vote to cut 28 majors and more than 100 faculty positions at West Virginia University, the university’s library system could be the next to take the ...

American Library Association (ALA) Releases Preliminary Data on 2023 Book Challenges; Highest Number of Book Challenges Since ALA...

UPDATE LeVar Burton to Lead 2023 Banned Books Week as Honorary Chair (via ALA) —End Update— Below is the full text of a statement released today by the American Library ...

Harris County Libraries Declared a 'Book Sanctuary' Amid State Crackdown; UCLA Library Receives $4.2 Million Political Cartoon Collection...

Acquisitions UCLA Library Receives $4.2 Million Political Cartoon Collection Spanning Centuries (via UCLA  California At 20, San Jose’s MLK Library Remains a Partnership For the Books (via The Mercury News) ...

The Lens Loads Now Open Dataset From Crossref of Retraction Watch Papers; Digital Science Announces Brand Redesign for...

Clarivate Clarivate Unveils Citation Laureates 2023 – Annual List of Researchers of Nobel Class Digital Science Digital Science Announces Brand Redesign for ReadCube and Papers Internet Archive IMLS National Leadership Grant ...

New From AUPresses & Ithaka S+R: "Print Revenue and Open Access Monographs: A University Press Study"

From a Joint News Release: The Association of University Presses (AUPresses) and Ithaka S+R today publish “Print Revenue and Open Access Monographs: A University Press Study.” This report is the ...

ADVERTISEMENT

FOLLOW US ON TWITTER

Tweets by infoDOCKET

ADVERTISEMENT

This coverage is free for all visitors. Your support makes this possible.

This coverage is free for all visitors. Your support makes this possible.

Primary Sidebar

  • News
  • Reviews+
  • Technology
  • Programs+
  • Design
  • Leadership
  • People
  • COVID-19
  • Advocacy
  • Opinion
  • INFOdocket
  • Job Zone

Reviews+

  • Booklists
  • Prepub Alert
  • Book Pulse
  • Media
  • Readers' Advisory
  • Self-Published Books
  • Review Submissions
  • Review for LJ

Awards

  • Library of the Year
  • Librarian of the Year
  • Movers & Shakers 2022
  • Paralibrarian of the Year
  • Best Small Library
  • Marketer of the Year
  • All Awards Guidelines
  • Community Impact Prize

Resources

  • LJ Index/Star Libraries
  • Research
  • White Papers / Case Studies

Events & PD

  • Online Courses
  • In-Person Events
  • Virtual Events
  • Webcasts
  • About Us
  • Contact Us
  • Advertise
  • Subscribe
  • Media Inquiries
  • Newsletter Sign Up
  • Submit Features/News
  • Data Privacy
  • Terms of Use
  • Terms of Sale
  • FAQs
  • Careers at MSI


© 2023 Library Journal. All rights reserved.


© 2022 Library Journal. All rights reserved.