Pi-DUAL: Using privileged information to distinguish clean from noisy labels

Authors: Ke Wang, Guillermo Ortiz-Jimenez, Rodolphe Jenatton, Mark Collier, Efi Kokiopoulou, Pascal Frossard

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, Pi-DUAL achieves significant performance improvements on key PI benchmarks (e.g., +6.8% on Image Net-PI), establishing a new state-of-the-art test set accuracy. We now validate the effectiveness of Pi-DUAL on several public noisy label benchmarks with PI and compare it extensively to other algorithms.
Researcher Affiliation Collaboration 1École Polytechnique Fédérale de Lausanne (EPFL) 2Google DeepMind 3Work done while at EPFL 4Bioptimus 5Work done while at Google DeepMind 6Google Research.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code No The paper states: 'All our experiments, including the reimplementation of other noisy label methods, are built on the open-source uncertainty_baselines codebase (Nado et al., 2021) and follow as much as possible the benchmarking practices of Ortiz-Jimenez et al. (2023).' However, it does not explicitly state that the code for the Pi-DUAL method itself is publicly released or provide a link to it.
Open Datasets Yes We use the following PI datasets to evaluate the performance of Pi-DUAL and other methods: CIFAR-10H (Peterson et al., 2019)... CIFAR-10N and CIFAR-100N (Wei et al., 2022)... Image Net-PI (Ortiz-Jimenez et al., 2023) is a relabeled version of the Image Net ILSVRC12 dataset (Deng et al., 2009).
Dataset Splits Yes We use a noisy validation set, held-out from the training set, to select the best hyperparameters and report results over the clean test set. We provide more details on our hyper-parameter tuning strategy and other experimental settings in Appendix B... On CIFAR-10H, we randomly select 4% of the samples; on CIFAR-10N and CIFAR-100N, 2%; and on Image Net-PI, 1% of all the samples in the training set.
Hardware Specification Yes In this paper, we used a TPU V3 with 8 cores for experiments on Image Net-PI, and A100 (40G) for experiments on CIFAR.
Software Dependencies No The paper states that experiments are built on 'the open-source uncertainty_baselines codebase (Nado et al., 2021)', but it does not specify version numbers for key software components or libraries like Python, PyTorch/TensorFlow, or CUDA.
Experiment Setup Yes Our experimental settings follow the benchmarking practices laid out by Ortiz-Jimenez et al. (2023). In particular, we use the same architectures, training schedules and public codebase (Nado et al., 2021) to perform all our experiments... We train all models for 90 epochs, with the learning rate decaying multiplicatively by 0.2 after 36, 72 and 96 epochs. We use a batch size of 256 in all experiments, and train the models with an SGD optimizer with 0.9 Nesterov momentum.