Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Complementary Label Learning with Positive Label Guessing and Negative Label Enhancement

Authors: Yuhang Li, Zhuying Li, Yuheng Jia

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments demonstrate the superiority of PLNL over the state-of-the-art CLL methods, e.g., on STL-10, we increase the classification accuracy from 34.96% to 55.25%. The source code is available at https://github.com/yhli-ml/PLNL.
Researcher Affiliation	Academia	Yuhang Li1 Zhuying Li1 Yuheng Jia1,2 1School of Computer Science and Engineering, Southeast University, Nanjing 210096, China 2Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications (Southeast University), Ministry of Education, China EMAIL
Pseudocode	Yes	The overall framework of PLNL is shown in Fig.1, and the pseudo-code is presented in Appendix A.
Open Source Code	Yes	The source code is available at https://github.com/yhli-ml/PLNL.
Open Datasets	Yes	We use five commonly used benchmark datasets, STL-10 (Coates et al., 2011), Fashion MNIST (FMNIST) (Xiao et al., 2017), SVHN (Netzer et al., 2011), CIFAR-10 and CIFAR-100 (Krizhevsky and Hinton, 2009).
Dataset Splits	Yes	Tiny-Image Net (Le and Yang, 2015) contains 100000 images of 200 classes. Each class has 500 training images, 50 validation images and 50 test images.
Hardware Specification	Yes	All of our experiments are conducted with 8 NVIDIA 4090 GPUs.
Software Dependencies	No	All of the experiments are implemented based on Py Torch (Paszke et al., 2019) and all of our experiments are conducted with 8 NVIDIA 4090 GPUs. We employed faiss (Johnson et al., 2021) to compute k-NN instances in the output space, which is a library for efficient similarity search and clustering of dense vectors. The paper mentions software names (PyTorch, faiss) but does not specify their version numbers.
Experiment Setup	Yes	Values of hyperparameters in PLNL are set as follows. The queue size t is selected from {2, 3, 4, 5}, k-NN parameter k is selected from {100, 250, 500}. The α in the instanceaware self-adaptive threshold is selected from {0.1, 0.3, 0.5, 0.7, 0.9, 0.99}. For each method, we train the commonly used Pre Act-Res Net18 (He et al., 2016) with 200 epochs (initial 20 epochs for warm-up), and use SGD as the opimizer with a momentum of 0.9, a weight decay of 1e-4. We set the batch size from {64, 128}, the initial learning rate from {10 1, 10 2}, and we use cosine learning rate scheduling with final learning rate 10 3.