Research Article: “Measuring the Importance of User-Generated Content to Search Engines” (Preprint)
The article (preprint) linked below was recently shared on arXiv.
Title
Measuring the Importance of User-Generated Content to Search Engines
Authors
Nicholas Vincent
Northwestern University
Isaac Johnson
Northwestern University
Patrick Sheehan
Northwestern University
Brent Hecht
Northwestern University
Source
via arXiv
Paper Accepted at ICWSM 2019
Abstract
Search engines are some of the most popular and profitable intelligent technologies in existence. Recent research, however, has suggested that search engines may be surprisingly dependent on user-created content like Wikipedia articles to address user information needs.
In this paper, we perform a rigorous audit of the extent to which Google leverages Wikipedia and other user-generated content to respond to queries. Analyzing results for six types of important queries (e.g. most popular, trending, expensive advertising), we observe that Wikipedia appears in over 80% of results pages for some query types and is by far the most prevalent individual content source across all query types.
More generally, our results provide empirical information to inform a nascent but rapidly-growing debate surrounding a highly-consequential question: Do users provide enough value to intelligent technologies that they should receive more of the economic benefits from intelligent technologies?
Direct to Full Text Article
13 pages; PDF.
Filed under: Journal Articles, News, Patrons and Users

About Gary Price
Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.