Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Learning From Noisy Singly-labeled Data

Authors: Ashish Khetan, Zachary C. Lipton, Animashree Anandkumar

ICLR 2018 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments conducted on both Image Net (with simulated noisy workers) and MS-COCO (using the real crowdsourced labels) confirm our algorithm s benefits. Empirically, we verify our approach on several multi-class classification datasets: Image Net and CIFAR10 (with simulated noisy workers), and MS-COCO (using the real noisy annotator labels). Our experiments validate that when the cost of obtaining unlabeled examples is negligible and the total annotation budget is fixed, it is best to collect a single label per training example for as many examples as possible.
Researcher Affiliation Collaboration Ashish Khetan University of Illinois at Urbana-Champaign Urbana, IL 61801 EMAIL Zachary C. Lipton Amazon Web Services Seattle, WA 98101 EMAIL Animashree Anandkumar Amazon Web Services Seattle, WA 98101 EMAIL
Pseudocode Yes Algorithm 1 Model Bootstrapped EM (MBEM)
Open Source Code Yes A Python implementation of the MBEM algorithm is available for download at https://github.com/khetan2/MBEM.
Open Datasets Yes Experiments conducted on both Image Net (with simulated noisy workers) and MS-COCO (using the real crowdsourced labels) confirm our algorithm s benefits. Empirically, we verify our approach on several multi-class classification datasets: Image Net and CIFAR10 (with simulated noisy workers), and MS-COCO (using the real noisy annotator labels).
Dataset Splits Yes The Image Net-1K dataset contains 1.2M training examples and 50K validation examples. We divide test set in two parts: 10K for validation and 40K for test. We use 35K images for training the model and 1K for validation and 4K for testing.
Hardware Specification No The paper mentions training with ResNet models but does not specify any hardware details such as GPU/CPU models, memory, or specific computing environments used for the experiments.
Software Dependencies No The paper mentions a Python implementation of the MBEM algorithm but does not specify Python version or any other software dependencies with version numbers.
Experiment Setup No The paper mentions using a 20-layer Res Net and running MBEM for T=2 rounds, but it does not provide specific hyperparameters like learning rates, batch sizes, optimizers, or other detailed system-level training settings.