Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Collaborative Refining for Learning from Inaccurate Labels
Authors: BIN HAN, Yi-Xuan Sun, Ya-Lin Zhang, Libang Zhang, Haoran Hu, Longfei Li, Jun Zhou, Guo Ye, HUIMEI HE
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments are conducted on benchmark and real-world datasets, which demonstrate the superiority of the proposed framework. |
| Researcher Affiliation | Industry | Bin Han, Yi-Xuan Sun, Ya-Lin Zhang, Libang Zhang, Haoran Hu Longfei Li, Jun Zhou , Guo Ye, Huimei He Ant Group EMAIL |
| Pseudocode | Yes | Algorithm 1 Collaborative Refining for Learning from inaccurate labels (CRL). |
| Open Source Code | No | We will consider open-sourcing the code after the paper is accepted. |
| Open Datasets | Yes | Benchmark datasets. All the methods are evaluated on 13 benchmark datasets with two kinds of noise...Real-world datasets. Experiments are also conducted on two real-world datasets: CIFAR-10N and Sentiment. Both datasets were published on Amazon Mechanical Turk for annotation. Details of these datasets and labels can be found in Appendix B. (Appendix B then lists sources and citations, e.g., 'Diabetes dataset is sampled from a dataset on Kaggle1', 'Sentiment: This dataset is the original one in the website2'). |
| Dataset Splits | Yes | For benchmark datasets, 70% of each dataset is utilized for training, 5% for validation, and 25% for testing. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. It only describes the model architecture and general training setup. |
| Software Dependencies | No | The paper describes the model architecture and training parameters but does not specify version numbers for programming languages, libraries, or other software dependencies. |
| Experiment Setup | Yes | For our method...For RUS, we set the proportion of selected samples p = 0.8, and take the 5th epoch and the latest epoch during training as the selected epochs in Eq.( 8). In practice, LRD-generated labels are held constant after 5 training epochs to mitigate the over-fitting issue. For all of the methods, experiments are conducted with 0.001 learning rate, 100 training epochs, and 256 batch size on MLP with hidden dimension 128 for a fair comparison. |