Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Learning From Noisy Singly-labeled Data
Authors: Ashish Khetan, Zachary C. Lipton, Animashree Anandkumar
ICLR 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments conducted on both Image Net (with simulated noisy workers) and MS-COCO (using the real crowdsourced labels) confirm our algorithm s benefits. Empirically, we verify our approach on several multi-class classification datasets: Image Net and CIFAR10 (with simulated noisy workers), and MS-COCO (using the real noisy annotator labels). Our experiments validate that when the cost of obtaining unlabeled examples is negligible and the total annotation budget is fixed, it is best to collect a single label per training example for as many examples as possible. |
| Researcher Affiliation | Collaboration | Ashish Khetan University of Illinois at Urbana-Champaign Urbana, IL 61801 EMAIL Zachary C. Lipton Amazon Web Services Seattle, WA 98101 EMAIL Animashree Anandkumar Amazon Web Services Seattle, WA 98101 EMAIL |
| Pseudocode | Yes | Algorithm 1 Model Bootstrapped EM (MBEM) |
| Open Source Code | Yes | A Python implementation of the MBEM algorithm is available for download at https://github.com/khetan2/MBEM. |
| Open Datasets | Yes | Experiments conducted on both Image Net (with simulated noisy workers) and MS-COCO (using the real crowdsourced labels) confirm our algorithm s benefits. Empirically, we verify our approach on several multi-class classification datasets: Image Net and CIFAR10 (with simulated noisy workers), and MS-COCO (using the real noisy annotator labels). |
| Dataset Splits | Yes | The Image Net-1K dataset contains 1.2M training examples and 50K validation examples. We divide test set in two parts: 10K for validation and 40K for test. We use 35K images for training the model and 1K for validation and 4K for testing. |
| Hardware Specification | No | The paper mentions training with ResNet models but does not specify any hardware details such as GPU/CPU models, memory, or specific computing environments used for the experiments. |
| Software Dependencies | No | The paper mentions a Python implementation of the MBEM algorithm but does not specify Python version or any other software dependencies with version numbers. |
| Experiment Setup | No | The paper mentions using a 20-layer Res Net and running MBEM for T=2 rounds, but it does not provide specific hyperparameters like learning rates, batch sizes, optimizers, or other detailed system-level training settings. |