Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Label Noise-Robust Learning using a Confidence-Based Sieving Strategy
Authors: Reihaneh Torkzadehmahani, Reza Nasirigerdeh, Daniel Rueckert, Georgios Kaissis
TMLR 2023 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Then, we experimentally illustrate the superior performance of our proposed approach compared to recent studies on various settings, such as synthetic and real-world label noise. Moreover, we show CONFES can be combined with other state-of-the-art approaches, such as Co-teaching and Divide Mix to further improve model performance. |
| Researcher Affiliation | Academia | Reihaneh Torkzadehmahani EMAIL Technical University of Munich Reza Nasirigerdeh EMAIL Technical University of Munich Helmholtz Munich Daniel Rueckert EMAIL Technical University of Munich Imperial College London Georgios Kaissis EMAIL Technical University of Munich Helmholtz Munich |
| Pseudocode | Yes | Algorithm 1: Confidence error based sieving (CONFES) Algorithm 2: Instance-dependent Label Noise Generation taken from Xia et al. (2020) |
| Open Source Code | Yes | The code is available at: https://github.com/reihaneh-torkzadehmahani/confes |
| Open Datasets | Yes | We utilize the CIFAR-10/100 datasets (Krizhevsky et al., 2009) and make them noisy using different types of synthetic label noise. Furthermore, we incorporate the Clothing1M dataset (Xiao et al., 2015), a naturally noisy benchmark dataset widely employed in previous studies. |
| Dataset Splits | Yes | CIFAR-10/100 contain 50000 training samples and 10000 testing samples of shape 32 32 from 10/100 classes. For the CIFAR datasets, we perturb the training labels using symmetric, pairflip, and instance-dependent label noise introduced in Xia et al. (2020), but keep the test set clean. ... Clothing1M is a real-world dataset of 1 million images of size 224 224 with noisy labels (whose estimated noise level is approximately 38% (Wei et al., 2022; Song et al., 2019)) and 10k clean test images from 14 classes. |
| Hardware Specification | Yes | We conduct the experiments on a single GPU system equipped with an NVIDIA RTX A6000 graphic processor and 48GB of GPU memory. |
| Software Dependencies | Yes | Our method is implemented in Py Torch v1.9. |
| Experiment Setup | Yes | For all methods, we evaluate the average test accuracy on the last five epochs, and for co-teaching, we report the average of this metric for the two networks. Following previous works (Li et al., 2020; Bai et al., 2021), we train the Pre Act Res Net-18 (He et al., 2016) model on CIFAR-10 and CIFAR-100 using the SGD optimizer with momentum of 0.9, weight decay of 5e-4, and batch size of 128. The initial learning rate is set to 0.02, which is decreased by 0.01 in 300 epochs using cosine annealing scheduler (Loshchilov & Hutter, 2017). For the Cloting1M dataset, we adopt the setting from Li et al. (2020) and train the Res Net-50 model for 80 epochs. The optimizer is SGD with momentum of 0.9 and weight decay of 1e-3. The initial learning rate is 0.002, which is reduced by factor of 10 at epoch 40. At each epoch, the model is trained on 1000 mini-batches of size 32. |