reproducibilityindex.ai

Learning From Noisy Singly-labeled Data

Authors: Ashish Khetan, Zachary C. Lipton, Animashree Anandkumar

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments conducted on both Image Net (with simulated noisy workers) and MS-COCO (using the real crowdsourced labels) conﬁrm our algorithm s beneﬁts. Empirically, we verify our approach on several multi-class classiﬁcation datasets: Image Net and CIFAR10 (with simulated noisy workers), and MS-COCO (using the real noisy annotator labels). Our experiments validate that when the cost of obtaining unlabeled examples is negligible and the total annotation budget is ﬁxed, it is best to collect a single label per training example for as many examples as possible.
Researcher Affiliation	Collaboration	Ashish Khetan University of Illinois at Urbana-Champaign Urbana, IL 61801 khetan2@illinois.edu Zachary C. Lipton Amazon Web Services Seattle, WA 98101 liptoz@amazon.com Animashree Anandkumar Amazon Web Services Seattle, WA 98101 anima@amazon.com
Pseudocode	Yes	Algorithm 1 Model Bootstrapped EM (MBEM)
Open Source Code	Yes	A Python implementation of the MBEM algorithm is available for download at https://github.com/khetan2/MBEM.
Open Datasets	Yes	Experiments conducted on both Image Net (with simulated noisy workers) and MS-COCO (using the real crowdsourced labels) conﬁrm our algorithm s beneﬁts. Empirically, we verify our approach on several multi-class classiﬁcation datasets: Image Net and CIFAR10 (with simulated noisy workers), and MS-COCO (using the real noisy annotator labels).
Dataset Splits	Yes	The Image Net-1K dataset contains 1.2M training examples and 50K validation examples. We divide test set in two parts: 10K for validation and 40K for test. We use 35K images for training the model and 1K for validation and 4K for testing.
Hardware Specification	No	The paper mentions training with ResNet models but does not specify any hardware details such as GPU/CPU models, memory, or specific computing environments used for the experiments.
Software Dependencies	No	The paper mentions a Python implementation of the MBEM algorithm but does not specify Python version or any other software dependencies with version numbers.
Experiment Setup	No	The paper mentions using a 20-layer Res Net and running MBEM for T=2 rounds, but it does not provide specific hyperparameters like learning rates, batch sizes, optimizers, or other detailed system-level training settings.