The following white paper was published yesterday by The Computing Community Consortium (CCC).
Prepared for the Computing Community Consortium Committee of the Computing Research Association
From the Paper
Government statistical agencies collect enormously valuable data on the nation’s population and business activities. Wide access to these data enables evidence-based policy making, supports new research that improves society, facilitates training for students in data science, and provides resources for the public to better understand and participate in their society. These data also affect the private sector. For example, the Employment Situation in the United States, published by the Bureau of Labor Statistics, moves markets. Nonetheless, government agencies are under increasing pressure to limit access to data because of a growing understanding of the threats to data privacy and confidentiality.
“De-identification” — stripping obvious identifiers like names, addresses, and identification numbers — has been found inadequate in the face of modern computational and informational resources (Sweeney 2007; Narayanan and Shmatikov 2006; Narayanan and Shmatikov 2010; Sweeney 2013; see also the report of the President’s Council of Advisors on Science and Technology 2014).
Unfortunately, the problem extends even to the release of aggregate data statistics (Dinur and Nissim 2003; Dwork, McSherry, and Talwar 2007; Homer et al. 2008; Kasiviswanathan, Rudelson, Smith, and Ullman 2010; De 2012; Kasiviswanathan, Rudelson, and Smith, 2013; Muthukrishnan and Nikolov 2012; Dwork et al 2015). This counter-intuitive phenomenon has come to be known as the Fundamental Law of Information Recovery. It says that overly accurate estimates of too many statistics can completely destroy privacy. One may think of this as death by a thousand cuts.
Direct to Full Text (7 pages; PDF)