Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

A Closer Look to Positive-Unlabeled Learning from Fine-grained Perspectives: An Empirical Study

Authors: Yuanchao Dai, Zhengzhang Hou, Changchun Li, Yuanbo Xu, En Wang, Ximing Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we conduct a comprehensive study to investigate the basic characteristics of current PU learning methods. We organize them into two fundamental families of PU learning, including disambiguation-free empirical risks, which approximate the expected risk of supervised learning, and pseudo-labeling methods, which estimate pseudo-labels for unlabeled instances. First, we make an empirical analysis on disambiguation-free empirical risks such as u PU, nn PU, and Dist PU, and suggest a novel risk-consistent set-aware empirical risk from the perspective of aggregate supervision. Second, we make an empirical analysis of pseudo-labeling methods to evaluate the potential of pseudo-label estimation techniques and widely applied generic tricks in PU learning. Finally, based on those empirical findings, we propose a general framework of PU learning by integrating the set-aware empirical risk with pseudo-labeling. Compared with existing PU learning methods, the proposed framework can be a practical benchmark in PU learning.
Researcher Affiliation	Academia	Yuanchao Dai1,2, Zhengzhang Hou1,2, Changchun Li1,2, Yuanbo Xu1, En Wang1, Ximing Li1,2,3 1College of Computer Science and Technology, Jilin University, China 2Key Laboratory of Symbolic Computation and Knowledge Engineering, Jilin University, China 3RIKEN Center for Advanced Intelligence Project
Pseudocode	No	The paper describes methods and techniques using mathematical formulations and descriptive text but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	We have elaborated on the implementation principles and details of the method to facilitate the reproduction of the main experimental results presented in our paper. Additionally, we have submitted our code and datasets in the Supplementary Material.
Open Datasets	Yes	We conduct empirical evaluations on 3 standard benchmark datasets, i.e. Fashion-MNIST (F-MNIST), CIFAR-10, and STL-10.
Dataset Splits	No	For all datasets, the number of positive labeled instances is fixed as np = 1, 000. The details of datasets are summarized in Table 2. Explanation: The paper specifies the number of positive labeled instances (np = 1,000) within the PU learning setup and how classes are partitioned for binary classification, but it does not explicitly state the overall training, validation, and testing splits (e.g., percentages or exact counts) for the underlying benchmark datasets (F-MNIST, CIFAR-10, STL-10) from which these PU sets are derived.
Hardware Specification	Yes	All experiments are conducted with five different random seeds on a server equipped with two Nvidia RTX4090 GPUs, and we report the mean and standard deviation of the results.
Software Dependencies	No	The paper does not explicitly mention any specific software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, CUDA versions).
Experiment Setup	Yes	For each PU learning method, We employ dataset-appropriate backbones as follows: Le Net-5 for F-MNIST, 7-layer CNN for CIFAR-10 and STL-10; the MLP layer is used as the classification layer across all datasets. The mini-batch is fixed as 512 and the number of epochs is set to 100 for F-MNIST and 200 for others. Fixed thresholding treats the threshold value τ as a hyper-parameter, and empirically sets it as a constant value. Here, we fix τ to 0.95. For the warm-up stage, we train using only SAPU for 20 epochs before introducing the pseudo-labeling component.