Regroup Median Loss for Combating Label Noise

Authors: Fengpeng Li, Kemou Li, Jinyu Tian, Jiantao Zhou

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Compared to state-of-the-art methods, for both the traditionally trained and semi-supervised models, RML achieves a significant improvement on synthetic and complex real-world datasets. The source code is available at https://github.com/Feng-peng-Li/Regroup-Loss Median-to-Combat-Label-Noise. We perform experiments on synthetic datasets and real-world datasets. Tab. 1 shows the experimental comparisons on CIFAR-10 and CIFAR-100 without the semi-supervised strategy. RML increases the average test accuracy by about 1% on CIFAR-10 and by about 6% on CIFAR-100.
Researcher Affiliation Academia Fengpeng Li1, Kemou Li1, Jinyu Tian2, Jiantao Zhou1* 1State Key Laboratory of Internet of Things for Smart City, Department of Computer and Information Science, University of Macau 2Faculty of Innovation Engineering, Macau University of Science and Technology
Pseudocode Yes The pseudocode of the RML-based method is described in Alg. 1 of Appendix . The detailed procedure can be found in Alg. 2 of Appendix .
Open Source Code Yes The source code is available at https://github.com/Feng-peng-Li/Regroup-Loss Median-to-Combat-Label-Noise.
Open Datasets Yes For experiments on synthetic datasets, we choose two commonly used datasets CIFAR-10 and CIFAR-100 with different rates of symmetric label noise, pairflip label noise, and instance-dependent label noise (Xia et al. 2020). For the real-world datasets, we choose Clothing1M and Web Vision.
Dataset Splits No The paper mentions training and testing but does not explicitly provide percentages or counts for a separate validation split, nor does it cite a standard validation split. It uses the term
Hardware Specification Yes All our experiments are performed on Ubuntu 20.04.3 LTS workstations with Intel Xeon 5120 and 5 3090 by Py Torch.
Software Dependencies No The paper mentions
Experiment Setup Yes For the experiments on CIFAR-10, we set k to 60 for a symmetric label noise ratio of 0.8. For instance-dependent and symmetric label noise with 0.2 ratio, the k is 600. The remaining experiments on CIFAR-10 adopt k = 200. The experiments on CIFAR-100 use k = 50 when the noise rate is 0.2. For the experiments on CIFAR-100 with a noise rate of 0.8, k is 6. For the rest of the experiments, k is set to 20. In our experiments, λ is set to 0.999 according to (Tarvainen and Valpola 2017). To take into account the model performance and the convergence speed, we choose n = 6 on all datasets.