Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Balancing Positive and Negative Classification Error Rates in Positive-Unlabeled Learning

Authors: Ximing Li, Yuanchao Dai, Bing Wang, Changchun Li, Jianfeng Qu, Renchu Guan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct extensive experiments to indicate the effectiveness of DC-PU. We also empirically evaluate DC-PU on benchmark PU learning datasets. The results demonstrate that (1) compared with other risk estimators of PU learning, DC-PU achieves higher accuracy and converges more stably, and (2) compared with practical PU learning methods, DC-PU performs competitive accuracy performance. We conduct experiments on two widely-adopted benchmark datasets: Fashion-MNIST (F-MNIST)[28] and CIFAR-10[29], along with a real-world medical dataset Alzheimer.
Researcher Affiliation	Academia	1College of Computer Science and Technology, Jilin University, China 2Key Laboratory of Symbolic Computation and Knowledge Engineering, Jilin University, China 3RIKEN Center for Advanced Intelligence Project 4School of Computer Science and Technology, Soochow University, China 5Suzhou Key Lab of Multi-modal Data Fusion and Intelligent Healthcare, Suzhou City University, China EMAIL
Pseudocode	Yes	Algorithm 1 Training of DC-PU
Open Source Code	Yes	Additionally, we have submitted our code and datasets in the Supplementary Material. ... We have submitted our code and datasets in the Supplementary Material.
Open Datasets	Yes	To comprehensively evaluate the proposed method, we conduct experiments on two widely-adopted benchmark datasets: Fashion-MNIST (F-MNIST)[28] and CIFAR-10[29], along with a real-world medical dataset Alzheimer. ... We have submitted our code and datasets in the Supplementary Material.
Dataset Splits	Yes	For each dataset, we apply nn PU to train a classifier g and report the scores of FNR and FPR measured on both the training and test sets. ... Table 1: Detailed characteristics of datasets. ... F-MNIST-1 28 28 60,000 10,000 ... Alzheimer 3 224 224 5,121 1,279
Hardware Specification	Yes	All experiments are conducted on a server equipped with two Nvidia RTX4090 GPUs.
Software Dependencies	No	The paper mentions implementing methods and using external libraries but does not specify particular version numbers for these software components. For example, it mentions
Experiment Setup	Yes	We conduct a comprehensive parameter sensitivity analysis with respect to the parameters τ and β, and the results are presented in Fig.5. For the parameter τ, our experiments show optimal performance at 2 10 3 across most datasets, which aligns with our theoretical analysis that moderate penalty parameters achieve balance between constraint strength and optimization stability. For the parameter β, we find that the range [0.4, 0.5] performs optimally across different datasets, which is significant because β controls the update speed of dynamic lower bound. ... Input: PU learning dataset Dp Du; method parameters β, τ, γ; number of iterations T. ... In terms of ω, we update it with an exponential moving average trick: ω(t) = βω(t 1) + (1 β)(1 π) b R+ p (g; D(t) p ),