Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Reduction-based Pseudo-label Generation for Instance-dependent Partial Label Learning

Authors: Congyu Qiao, Ning Xu, Yihao Hu, Xin Geng

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we validate the effectiveness of our proposed RPLG by conducting it on manually corrupted benchmark datasets and real-world datasets and comparing its results against DNN-based PLL algorithms. Also, we explore RPLG through ablation study, sensitivity analysis, convergence analysis, and time consumption. The implementation is based on Pytorch [27] with the GPU model NVIDIA RTX 3090. The source code is available at https://github.com/palm-ml/rplg.
Researcher Affiliation	Academia	1 School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2 Key Laboratory of New Generation Artiﬁcial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China EMAIL
Pseudocode	Yes	The algorithmic description of RPLG is presented in Algorithm 1 in Appendix A.3.
Open Source Code	Yes	The source code is available at https://github.com/palm-ml/rplg.
Open Datasets	Yes	RPLG and compared DNN-based PLL algorithms are implemented on three widely used benchmark datasets in deep learning: FMNIST[41], KMNIST [4], CIFAR-10, CIFAR-100 [17] and Tiny Image Net [19]. Besides, since data augmentation cannot be performed on extracted features from audio and video data, our approach and data-augmentation-free PLL methods are also performed on ﬁve frequently used real-world datasets, which come from different practical application domains, including Lost [5], Bird Song [1], MSRCv2 [21], Soccer Player [50] and Yahoo!News [9].
Dataset Splits	Yes	As for benchmark datasets, we split 10% samples from the training datasets to form the validating datasets. As for real-world datasets, we conduct the algorithms with 80%/10%/10% train/validation/test split.
Hardware Specification	Yes	The implementation is based on Pytorch [27] with the GPU model NVIDIA RTX 3090. [...] All methods were run for 250 epochs with a batch size of 256 on a single NVIDIA RTX 3090 with AMD EPYC 7453 28-Core Processor.
Software Dependencies	No	The implementation is based on Pytorch [27] with the GPU model NVIDIA RTX 3090. The optimizer is stochastic gradient descent (SGD) [29] with momentum 0.9, batch size 256, and epoch 250.
Experiment Setup	Yes	To ensure fairness, we utilize the same network backbone, optimizer, and data augmentation strategy across all compared methods. We take the same backbone as [42, 38] on CIFAR-10, CIFAR-100 and all real-world datasets, and [20, 48] on Tiny Image Net. The optimizer is stochastic gradient descent (SGD) [29] with momentum 0.9, batch size 256, and epoch 250. [...] For hyper-parameters, we carefully select the most appropriate ones for each algorithm to ensure optimal model parameters based on their performances on the validation datasets. To mitigate overﬁtting, the training procedure of a model will be halted prematurely if its performance on the validation dataset fails to improve over 50 epochs.