NoiseGPT: Label Noise Detection and Rectification through Probability Curvature
Authors: Haoyu Wang, Zhuo Huang, Zhiwei Lin, Tongliang Liu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through extensive experiments, we carefully demonstrate the effectiveness of Noise GPT on detecting and cleansing dataset noise, especially on ILSVRC12, the AUROC of Noise GPT reached over 0.92. And by integrating with existing methods, the classification performance can be significantly improved on noisy datasets, typically by 22.8% on 80% symmetric CIFAR-10 with M-correction. |
| Researcher Affiliation | Academia | Haoyu Wang School of Automation Beijing Institute of Technology haoyu.wang@bit.edu.cn Zhuo Huang Sydney AI Centre University of Sydney zhuohuang.ai@gmail.com Zhiwei Lin School of Automation Beijing Institute of Technology linzhiwei@bit.edu.cn Tongliang Liu Sydney AI Centre University of Sydney tongliang.liu@sydney.edu.au |
| Pseudocode | Yes | Algorithm 1 Noise GPT: noise identification and rectification. |
| Open Source Code | Yes | Source code: https://github.com/drunker Wang/Noise GPT |
| Open Datasets | Yes | Datasets In our experiments, we leverages re-annotated noisy datasets CIFAR-10N and CIFAR-100N [25] which contain real-world human annotation errors. We also generate noisy versions of CIFAR-10, CIFAR-100 [24], Web Vision [3] and Image Net ILSVRC2012 for our studies. |
| Dataset Splits | Yes | Web Vision [3]: We utilize its validation subset which contains 50,000 images for the 40% symmetric noise condition. Moreover, in order to verify the capacity of Noise GPT under the real-world circumstance where samples are collected without careful annotation, we utilize mini-Webvision, a subset of Webvision, for noise detection and rectification experiments and test the classification performance on the validation set of Webvision. Image Net ILSVRC2012: We utilizes the validation subset which contains 50,000 images and generate symmetric noise for 50% examples in it. |
| Hardware Specification | Yes | Our noise detection and rectification experiments of Noise GPT are powered by Ge Force RTX 4090, taking up about 28.5 Gi B memory in total for CIFAR datasets. |
| Software Dependencies | No | The paper mentions the use of CLIP models [23], MMICL [76], BLIP-2 [70], and FLAN-T5-XXL [79] as backbones, but does not provide specific version numbers for these software components or any other libraries used. |
| Experiment Setup | Yes | For exemplars that are used in the in-context learning process, we select 3 images per category to construct a tiny groundtruth support set {xe} to simulate the scarceness of examples in real-world condition. examples are also selected from this support set to generate perturbed sample features. Specifically, for each query sample, we construct n = 10 perturbed features with different perturbing resources from {xe}. For pseudo labels, we employ the top C = 3 prediction of CLIP to conduct label rectification process. Table 6: Noise GPT hyperparameters. Mo F weight 0.5 Number of exemplars per class 3 Number of perturbations per query 10 Threshold 0.7 Number of candidate labels 3 |