Understanding and Improving Early Stopping for Learning with Noisy Labels

Authors: Yingbin Bai, Erkun Yang, Bo Han, Yanhua Yang, Jiatong Li, Yinian Mao, Gang Niu, Tongliang Liu

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method on two synthetic datasets, CIFAR-10 and CIFAR-100 [12] with different levels of symmetric, pairflip, and instance-dependent label noise (abbreviated as instance label noise) and a real-world dataset Clothing-1M [32].
Researcher Affiliation Collaboration 1TML Lab, University of Sydney; 2Xidian University; 3Hong Kong Baptist University; 4Meituan-Dianping Group; 5RIKEN AIP
Pseudocode Yes Algorithm 1: Progressive Early Stopping with Semi-Supervised Learning
Open Source Code Yes The code is made public at https://github.com/tmllab/PES.
Open Datasets Yes We evaluate our method on two synthetic datasets, CIFAR-10 and CIFAR-100 [12] with different levels of symmetric, pairflip, and instance-dependent label noise (abbreviated as instance label noise) and a real-world dataset Clothing-1M [32].
Dataset Splits Yes For both of these two datasets, we leave 10% of data with noisy labels as noisy validation set.
Hardware Specification Yes All the experiments are conducted on a server with a single Nvidia V100 GPU.
Software Dependencies Yes Our method is implemented by Py Torch v1.6.
Experiment Setup Yes For experiments without semi-supervised learning, we follow [31], and use Res Net-18 [10] for CIFAR-10 and Res Net-34 for CIFAR-100. We split networks into three parts, the layers above block 4 as part 1, block 4 of Res Net as part 2, and the final layer as part 3. T1 is defined as 25 for CIFAR-10 and 30 for CIFAR-100, T2 as 7, and T3 as 5. The network is trained for 200 epochs and SGD with 0.9 momentum is used. The initial learning rate is set to 0.1 and decayed with a factor of 10 at the 100th and 150th epoch respectively, and a weight decay is set to 10 4. For T2 and T3, we employ an Adam optimizer with a learning rate of 10 4.