Early Stopping Against Label Noise Without Validation Data

Authors: Suqin Yuan, Lei Feng, Tongliang Liu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Through extensive experiments, we show both the effectiveness of the Label Wave method across various settings and its capability to enhance the performance of existing methods for learning with noisy labels.
Researcher Affiliation Academia 1 Sydney AI Centre, School of Computer Science, The University of Sydney 2 School of Computer Science and Engineering, Nanyang Technological University
Pseudocode Yes Algorithm 1 Label Wave Let θo be the initial parameters and v be the local minimum of PC. Let p be the Patience , representing the number of times a worsening PC is observed before halting. θ θo, t 0, i 0, v 1: while i < p do 2: Update θ by running the training for n steps, and t t + n. 3: PCt Compute prediction changes (PC) in step t. 4: PC t Moving Averages PC in recent k steps. 5: if PC t < v then 6: v PC t ; i 0, θ θ, t t // Models stored at every new local minimum. 7: else 8: i i + 1 // Counting Patience when PC t is larger than local minimum. 9: end if 10: end while=0 Best parameters are θ , and best number of training steps is t .
Open Source Code No The paper does not contain any explicit statement about releasing code or a link to a code repository for the Label Wave method.
Open Datasets Yes These datasets comprise seven vision-oriented sets: CIFAR-10, CIFAR-100 (Krizhevsky et al., 2009), CIFAR-N (Wei et al., 2021), Clothing1M (Xiao et al., 2015), Web Vision (Li et al., 2017), Food101 (Bossard et al., 2014), and Tiny-Image Net (Le & Yang, 2015), along with a text-oriented dataset: NEWS (Kiryo et al., 2017; Yu et al., 2019).
Dataset Splits Yes The CIFAR-10 dataset, which is accessible via the torchvision.datasets module. 20% of the training data is held out for validation during the training process.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies Yes Framework: Py Torch, Version 1.11.0.
Experiment Setup Yes Batch Size: 128 [...] Learning Rate: Fixed at 0.01. [...] Optimizer: Employs optim.SGD with momentum = 0.9. [...] By adjusting the batch sizes to 64, 128, 256, learning rates to 0.01, 0.005, 0.001, random seeds to 1, 2, 3, 4, 5, and employing different optimizers such as SGD with momentum (Robbins & Monro, 1951; Polyak, 1964), RMSprop (Tieleman et al., 2012), and Adam (Kingma & Ba, 2014)