Learning From Noisy Singly-labeled Data

Authors: Ashish Khetan, Zachary C. Lipton, Animashree Anandkumar

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments conducted on both Image Net (with simulated noisy workers) and MS-COCO (using the real crowdsourced labels) confirm our algorithm s benefits. Empirically, we verify our approach on several multi-class classification datasets: Image Net and CIFAR10 (with simulated noisy workers), and MS-COCO (using the real noisy annotator labels). Our experiments validate that when the cost of obtaining unlabeled examples is negligible and the total annotation budget is fixed, it is best to collect a single label per training example for as many examples as possible.
Researcher Affiliation Collaboration Ashish Khetan University of Illinois at Urbana-Champaign Urbana, IL 61801 khetan2@illinois.edu Zachary C. Lipton Amazon Web Services Seattle, WA 98101 liptoz@amazon.com Animashree Anandkumar Amazon Web Services Seattle, WA 98101 anima@amazon.com
Pseudocode Yes Algorithm 1 Model Bootstrapped EM (MBEM)
Open Source Code Yes A Python implementation of the MBEM algorithm is available for download at https://github.com/khetan2/MBEM.
Open Datasets Yes Experiments conducted on both Image Net (with simulated noisy workers) and MS-COCO (using the real crowdsourced labels) confirm our algorithm s benefits. Empirically, we verify our approach on several multi-class classification datasets: Image Net and CIFAR10 (with simulated noisy workers), and MS-COCO (using the real noisy annotator labels).
Dataset Splits Yes The Image Net-1K dataset contains 1.2M training examples and 50K validation examples. We divide test set in two parts: 10K for validation and 40K for test. We use 35K images for training the model and 1K for validation and 4K for testing.
Hardware Specification No The paper mentions training with ResNet models but does not specify any hardware details such as GPU/CPU models, memory, or specific computing environments used for the experiments.
Software Dependencies No The paper mentions a Python implementation of the MBEM algorithm but does not specify Python version or any other software dependencies with version numbers.
Experiment Setup No The paper mentions using a 20-layer Res Net and running MBEM for T=2 rounds, but it does not provide specific hyperparameters like learning rates, batch sizes, optimizers, or other detailed system-level training settings.