From Venture Beat:
The field of AI and machine learning is arguably built on the shoulders of a few hundred papers, many of which draw conclusions using data from a subset of public datasets. Large, labeled corpora have been critical to the success of AI in domains ranging from image classification to audio classification. That’s because their annotations expose comprehensible patterns to machine learning algorithms, in effect telling machines what to look for in future datasets so they’re able to make predictions.
But while labeled data is usually equated with ground truth, datasets can — and do — contain errors. The processes used to construct corpora often involve some degree of automatic annotation or crowdsourcing techniques that are inherently error-prone. This becomes especially problematic when these errors reach test sets, the subsets of datasets researchers use to compare progress and validate their findings. Labeling errors here could lead scientists to draw incorrect conclusions about which models perform best in the real world, potentially undermining the framework by which the community benchmarks machine learning systems.
A new paper and website published by researchers at MIT instill little confidence that popular test sets in machine learning are immune to labeling errors. In an analysis of 10 test sets from datasets that include ImageNet, an image database used to train countless computer vision algorithms, the coauthors found an average of 3.4% errors across all of the datasets. The quantities ranged from just over 2,900 errors in the ImageNet validation set to over 5 million errors in QuickDraw, a Google-maintained collection of 50 million drawings contributed by players of the game Quick, Draw!
Read the Complete Article
Full Text Research Paper: Pervasive Label Errors in Test SetsDestabilize Machine Learning Benchmarks
16 pages; PDF.