Digital Privacy: “You Can Probably Be Identified From Your Anonymized Data”

July 29, 2019 by Gary Price

If you thought that removing identifying information from a database of sensitive personal records was enough to retain privacy, it’s time to think again. A study published this week asserts that it’s even easier to re-identify information than we first thought.

[Clip]

The study, released in Nature Communications, calls all that into question. Its authors at the Université catholique de Louvain (Belgium) and at Imperial College London (UK) say that it’s easy to re-identify a high percentage of people in de-identified data sets.

[Clip]

What this latest research proves is that it’s even easier than we thought to reconstruct people’s identities, even when only a tiny subset of the data is released. When it comes to de-identification, it suggests that it might be time to go back to the drawing board.

The researchers have created an online tool that lets you check to see how identifiable you might be given your own characteristics.

Learn More, Read the Complete Article

Direct to Research Article: Estimating The Success Of Re-Identifications In Incomplete Datasets Using Generative Models (via Nature Communications)

While rich medical, behavioral, and socio-demographic data are key to modern data-driven research, their collection and use raise legitimate privacy concerns. Anonymizing datasets through de-identification and sampling before sharing them has been the main tool used to address those concerns. We here propose a generative copula-based method that can accurately estimate the likelihood of a specific person to be correctly re-identified, even in a heavily incomplete dataset. On 210 populations, our method obtains AUC scores for predicting individual uniqueness ranging from 0.84 to 0.97, with low false-discovery rate. Using our model, we find that 99.98% of Americans would be correctly re-identified in any dataset using 15 demographic attributes. Our results suggest that even heavily sampled anonymized datasets are unlikely to satisfy the modern standards for anonymization set forth by GDPR and seriously challenge the technical and legal adequacy of the de-identification release-and-forget model.

Filed under: Data Files, News

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.

Digital Privacy: “You Can Probably Be Identified From Your Anonymized Data”

About Gary Price

Archives

FOLLOW US ON X

Digital Privacy: “You Can Probably Be Identified From Your Anonymized Data”

About Gary Price

Archives

Related Infodocket Posts

FOLLOW US ON X