Mitigating Memorization of Noisy Labels by Clipping the Model Prediction
Authors: Hongxin Wei, Huiping Zhuang, Renchunzi Xie, Lei Feng, Gang Niu, Bo An, Yixuan Li
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To verify the effectiveness of our method, we conduct thorough empirical evaluations on both simulated and real-world noisy datasets, including CIFAR-10, CIFAR100 (Krizhevsky et al., 2009), and Web Vision (Li et al., 2017) datasets. |
| Researcher Affiliation | Academia | 1Southern University of Science and Technology. Work done while working at UW-Madison as a visiting scholar. 2South China University of Technology 3Nanyang Technological University 4RIKEN AIP 5University of Wisconsin-Madison. |
| Pseudocode | No | The paper does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not explicitly provide a link to its source code or state that it is open-source. |
| Open Datasets | Yes | To verify the efficacy of Logit Clip, we comprehensively consider four different types of label noise, including (1) symmetric noise, (2) asymmetric noise (Zhang & Sabuncu, 2018), (3) instance-dependent noise (Chen et al., 2020), and (4) real-world noise on CIFAR-10/100 (Krizhevsky et al., 2009) and Web Vision (Li et al., 2017) datasets. |
| Dataset Splits | Yes | We use 5k noisy samples as the validation dataset to tune the hyperparameter 1/τ in {0.1, 0.5, 1, 1.5, . . . , 4.5, 5}, then train the model on the full training set and report the average test accuracy in the last 10 epochs. |
| Hardware Specification | Yes | We conduct all the experiments on NVIDIA Ge Force RTX 3090, and implement all methods by Py Torch. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify a version number or other software dependencies with their versions. |
| Experiment Setup | Yes | In particular, we train the network for 200 epochs using SGD with a momentum of 0.9, a weight decay of 0.0005, and a batch size of 128. We set the initial learning rate as 0.1, and reduce it by a factor of 10 after 80 and 140 epochs. |