Learning from Noisy Labels via Conditional Distributionally Robust Optimization

Authors: Hui GUO, Grace Yi, Boyu Wang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results on both synthetic and real-world datasets demonstrate the superiority of our method.
Researcher Affiliation Academia Hui Guo Department of Computer Science University of Western Ontario hguo288@uwo.ca Grace Y. Yi Department of Statistical and Actuarial Sciences Department of Computer Science University of Western Ontario gyi5@uwo.ca Boyu Wang Department of Computer Science University of Western Ontario bwang@csd.uwo.ca
Pseudocode Yes Algorithm 1: Learning from Noisy Labels via Conditional Distributionally Robust True Label Posterior with an Adaptive Lagrange multiplier (Adapt CDRP)
Open Source Code Yes Code is available at https://github.com/hguo1728/Adapt CDRP.
Open Datasets Yes We evaluate the performance of the proposed Adapt CDRP on two datasets, CIFAR-10 and CIFAR-100 [21], by generating synthetic noisy labels (details provided below), as well as four datasets, CIFAR-10N [22], CIFAR-100N [22], Label Me [23, 24], and Animal10N [25], which contain human annotations.
Dataset Splits Yes For all datasets except Label Me, we set aside 10% of the original data, together with the corresponding synthetic or human annotated noisy labels, to validate the model selection procedure.
Hardware Specification Yes Training times are approximately 3 hours on CIFAR-10 and 5.5 hours on CIFAR-100 using an NVIDIA V100 GPU.
Software Dependencies No The paper mentions the Adam optimizer but does not specify software dependencies with version numbers (e.g., PyTorch, TensorFlow, or specific library versions).
Experiment Setup Yes A batch size of 128 is maintained across all datasets. We use the Adam optimizer [43] with a weight decay of 5 10 4 for CIFAR-10, CIFAR-100, CIFAR-10N, CIFAR-100N, and Label Me datasets. The initial learning rate for CIFAR-10, CIFAR-100, CIFAR-10N, and CIFAR-100N is set to 10 3, with the networks trained for 120, 150, 120, and 150 epochs respectively. The first 30 epochs serve as a warm-up.