Vision-Language Models are Strong Noisy Label Detectors

Authors: Tong Wei, Hao-Tian Li, ChunShu Li, Jiang-Xin Shi, Yu-Feng Li, Min-Ling Zhang

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on seven synthetic and real-world noisy datasets validate the effectiveness of DEFT in both noisy label detection and image classification tasks.
Researcher Affiliation Academia Tong Wei1,2,3 Hao-Tian Li1,2 Chun-Shu Li1,2 Jiang-Xin Shi3,4 Yu-Feng Li3,4 Min-Ling Zhang1,2 1School of Computer Science and Engineering, Southeast University, Nanjing, China 2Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, China 3National Key Laboratory for Novel Software Technology, Nanjing University, China 4School of Artificial Intelligence, Nanjing University, China {weit,liht}@seu.edu.cn
Pseudocode Yes Algorithm 1: The Proposed DEFT Framework
Open Source Code Yes Our source code is available at https://github.com/HotanLee/DeFT.
Open Datasets Yes We conduct experimental analyses on widely-used CIFAR-100 [23] and Tiny-Image Net [42], as well as two fine-grained datasets Stanford-Cars [22] and CUB-200-2011 [34]... We further examine the performance of DEFT on three real-world noisy label datasets: 1) CIFAR-100N [39]... 2) Clothing1M [47]... 3) Web Vision [26].
Dataset Splits No The paper mentions 'training' and 'test accuracy' but does not explicitly define or refer to a separate 'validation' dataset split for hyperparameter tuning or early stopping.
Hardware Specification Yes All experiments are conducted on a single NVIDIA Ge Force RTX 3090.
Software Dependencies No The paper mentions models and optimizers (e.g., CLIP, Vi T-B/16, SGD, Adam W) but does not provide specific version numbers for software libraries or dependencies (e.g., Python, PyTorch, CUDA versions).
Experiment Setup Yes We use the SGD optimizer with a momentum of 0.9, a weight decay of 5e-4, and a batch size of 64. We run 10 epochs for both the noisy label detection phase and the model adaptation phase with learning rates 3e-2 and 5e-4, respectively. In the noisy label detection phase, we employ VPT [18] and Co Op [53] to adapt visual encoder and textual encoder respectively, and perform model warm-up for 1 epoch on all datasets.