Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Noisy Multi-Label Learning through Co-Occurrence-Aware Diffusion

Authors: Senyu Hou, Yuru Ren, Gaoxia Jiang, Wenjian Wang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on both synthetic (Pascal-VOC, MS-COCO) and real-world (NUS-WIDE) noisy datasets demonstrate that our approach outperforms state-of-the-art methods.
Researcher Affiliation	Academia	Senyu Hou School of Computer and Information Technology, Shanxi University EMAIL Yuru Ren School of Computer and Information Technology, Shanxi University EMAIL Gaoxia Jiang School of Computer and Information Technology, Shanxi University EMAIL Wenjian Wang Key Laboratory of Data Intelligence and Cognitive Computing of Shanxi Province EMAIL
Pseudocode	Yes	Algorithm 1 CAD Training Algorithm C.1 CAD inference
Open Source Code	Yes	We have submitted the code and detailed documentation in the supplementary materials, including environment setup, execution scripts, and data preprocessing steps, to ensure the reproducibility and validity of our findings. In addition, all datasets used in the experiments are publicly available benchmark datasets.
Open Datasets	Yes	We validate the effectiveness of the proposed method on three multi-label synthetic noisy datasets: Pascal-VOC 2007 [48], Pascal-VOC 2012 [48] and MS-COCO [49], as well as on the real-world noisy dataset NUS-WIDE [50].
Dataset Splits	Yes	In the synthetic noisy datasets, we randomly retain 10% of the samples as a validation set and introduce simulated multi-label noise into the training set using a noise transition matrix T [18–21]. Specifically, for any i = j, Tij = r (yj Y yi / Y\|yj / Y yi Y) represents the probability r of the i-th class label to be corrupted into the j-th class label. The Pascal-VOC 2007 dataset consists of 5,011 training images and 4,952 test images, while Pascal-VOC 2012 includes 11,540 training images and 10,991 test images. ... The MS-COCO dataset contains 82,081 training images and 40,137 test images... Our experiments follow the training/testing splits provided by the original dataset.
Hardware Specification	Yes	All experiments were conducted on NVIDIA A800 GPUs. The training efficiency analysis is provided in Appendix H.
Software Dependencies	No	The paper mentions using the Adam optimizer and models like Res Net50 and Vi T-14/L but does not specify software or library versions like Python, PyTorch, TensorFlow, etc., with their corresponding version numbers.
Experiment Setup	Yes	We use the Adam optimizer for training 30 epochs with a batch size of 128. The initial learning rate is set to 5e-4, and a half-cycle cosine decay is employed. The images in three datasets resize to 224x224. The experimental results on the synthetic noisy datasets are averaged over ten independent random trials. Additionally, we used a range of K values from 1 to 100 on the validation set. Experimental results in Appendix G showed that the m AP remained relatively stable for K values between 30 and 60. Based on these results, we inferred that our CAD was relatively insensitive to variations within this range of K values and consequently set the default K value to 50.