Report: “MIT Study Finds ‘Systematic’ Labeling Errors in Popular AI Benchmark Datasets”

March 29, 2021 by Gary Price

From Venture Beat:

The field of AI and machine learning is arguably built on the shoulders of a few hundred papers, many of which draw conclusions using data from a subset of public datasets. Large, labeled corpora have been critical to the success of AI in domains ranging from image classification to audio classification. That’s because their annotations expose comprehensible patterns to machine learning algorithms, in effect telling machines what to look for in future datasets so they’re able to make predictions.

But while labeled data is usually equated with ground truth, datasets can — and do — contain errors. The processes used to construct corpora often involve some degree of automatic annotation or crowdsourcing techniques that are inherently error-prone. This becomes especially problematic when these errors reach test sets, the subsets of datasets researchers use to compare progress and validate their findings. Labeling errors here could lead scientists to draw incorrect conclusions about which models perform best in the real world, potentially undermining the framework by which the community benchmarks machine learning systems.

A new paper and website published by researchers at MIT instill little confidence that popular test sets in machine learning are immune to labeling errors. In an analysis of 10 test sets from datasets that include ImageNet, an image database used to train countless computer vision algorithms, the coauthors found an average of 3.4% errors across all of the datasets. The quantities ranged from just over 2,900 errors in the ImageNet validation set to over 5 million errors in QuickDraw, a Google-maintained collection of 50 million drawings contributed by players of the game Quick, Draw!

Read the Complete Article

Additional Resources

Overview Blog Post by Researchers

Label Errors Website

Full Text Research Paper: Pervasive Label Errors in Test SetsDestabilize Machine Learning Benchmarks
16 pages; PDF.

Filed under: Data Files, Journal Articles, News

About Gary Price

Gary Price (gprice@gmail.com) is a librarian, writer, consultant, and frequent conference speaker based in the Washington D.C. metro area. He earned his MLIS degree from Wayne State University in Detroit. Price has won several awards including the SLA Innovations in Technology Award and Alumnus of the Year from the Wayne St. University Library and Information Science Program. From 2006-2009 he was Director of Online Information Services at Ask.com.

Report: “MIT Study Finds ‘Systematic’ Labeling Errors in Popular AI Benchmark Datasets”

About Gary Price

Archives

FOLLOW US ON X

Report: “MIT Study Finds ‘Systematic’ Labeling Errors in Popular AI Benchmark Datasets”

About Gary Price

Archives

Related Infodocket Posts

FOLLOW US ON X