New Research Paper: “CensorSeeker: Generating a Large, Culture-Specific Blocklist”
The following research paper (preprint) was recently made available on arXiv.
CensorSeeker: Generating a Large, Culture-Specific Blocklist
June 4, 2018
Internet censorship measurements rely on lists of websites to be tested, or “block lists” that are curated by third parties. Unfortunately, many of these lists are not public, and those that are tend to focus on a small group of topics, leaving other types of sites and services untested. To increase and diversify the set of sites on existing block lists, we develop CensorSeeker, which uses search engines and natural language techniques to discover a much wider range of websites that are censored in China. Using this tool, we create a list of 821 websites outside the Alexa Top 1000 that cover Chinese politics, minority human rights organizations, and oppressed religions. Importantly, none of the sites we discover are present on the current largest block list. The list that we develop not only vastly expands the set of sites that current Internet measurement tools can test, but it also deepens our understanding of the nature of content that is censored in China. We have released both this new block list and the code for generating it.
Direct to Full Text Article
7 pages; PDF.
About Gary Price
Gary Price (firstname.lastname@example.org) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.