Noisy Label Learning with Instance-Dependent Outliers: Identifiability via Crowd Wisdom
Authors: Tri Nguyen, Shahana Ibrahim, Xiao Fu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments show that our learning scheme substantially improves outlier detection and the classifier s testing accuracy. We evaluate the proposed method over a number real datasets that are annotated by machine and human annotators under various conditions and observed nontrivial improvements of testing accuracy. 5 Experiments |
| Researcher Affiliation | Academia | Tri Nguyen School of EECS Oregon State University Corvallis, Oregon, USA nguyetr9@oregonstate.edu Shahana Ibrahim Department of ECE University of Central Florida Orlando, Florida, USA shahana.ibrahim@ucf.edu Xiao Fu School of EECS Oregon State University Corvallis, Oregon, USA xiao.fu@oregonstate.edu |
| Pseudocode | No | The paper describes algorithms and implementation but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | Yes | We release the code and our acquired noisy annotations at https://github.com/ductri/COINNet. |
| Open Datasets | Yes | Dataset. We consider the CIFAR-10 [62] and the STL-10 datasets [63] see Appendix G for details. CIFAR-10N. The first dataset that we use is the CIFAR-10N dataset [66]. Label Me. We also test the algorithms over the Label Me dataset [67,68]. Image Net-15N. In addition to existing datasets, we also acquire noisy annotations by asking AMT workers to annotate some images from Image Net. ... We release the code and our acquired noisy annotations at https://github.com/ductri/COINNet. |
| Dataset Splits | Yes | The CIFAR-10 dataset consists of 60, 000 labeled color images... The images are split into training and testing sets with size 50,000 and 10,000, respectively. ... We randomly split the training set into 47,500 and 2,500 to use as train and validation set for all methods. The validation set comprises 500 images, while the remaining 1,188 images are reserved for testing. (for Label Me) |
| Hardware Specification | Yes | All runs have been conducted using either Nvidia A40 or Nvidia DGX H100 GPU. |
| Software Dependencies | No | The paper mentions using 'Adam' as an optimizer and 'Res Net-34', 'Res Net-9', 'VGG-16', and 'CLIP' as architectures/models, but does not specify version numbers for any software libraries or frameworks (e.g., PyTorch, TensorFlow, scikit-learn) required for replication. |
| Experiment Setup | Yes | For our proposed approach COINNet, we fix ζ = 10 10, p = 0.4, and µ1 = µ2 = 0.01. Adam [65] is used as the optimizer with weight decay of 10 4, learning rate of 0.01, and batch size of 512. We train with batch size of 512, number of epochs 200, Adam optimizer with learning rate of 0.01 and learning rate scheduler One Cycle LR [86]. |