reproducibilityindex.ai

Positive and Unlabeled Learning with Controlled Probability Boundary Fence

Authors: Changchun Li, Yuanchao Dai, Lei Feng, Ximing Li, Bing Wang, Jihong Ouyang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive empirical results demonstrate that PUL-CPBF can achieve competitive performance compared with the existing PU learning baselines. We conduct extensive experiments to evaluate PUL-CPBF on benchmark datasets.
Researcher Affiliation	Academia	1College of Computer Science and Technology, Jilin University, China 2Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, China 3Information Systems Technology and Design Pillar, Singapore University of Technology and Design, Singapore.
Pseudocode	No	The paper describes the proposed method in prose but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code	No	The paper mentions implementing 'in-house code' for PUL-CPBF but does not provide any link or explicit statement about making their own code open-source or publicly available. It only provides links to other baseline methods' code.
Open Datasets	Yes	In the experiments, we employ 3 prevalent benchmark datasets, including Fashion MNIST (F-MNIST) (Xiao et al., 2017),1 CIFAR-10 (Krizhevsky, 2016),2 and STL-10 (Coates et al., 2011),3 and a real-world dataset on Alzheimer diagnosis (Alzheimer).4 ... F-MNIST: https://github.com/zalandoresearch/fashion-mnist ... CIFAR-10: http://www.cs.toronto.edu/kriz/cifar.html ... STL-10: https://cs.stanford.edu/acoates/stl10 ... Alzheimer: Dubey, S. Alzheimer s Dataset. Available online: https: //www.kaggle.com/tourist55/alzheimers-datas et-4-class-of-images
Dataset Splits	Yes	For the training set of each dataset, we form its PU version(s) composed of a few labeled positive instances drawn from its positive dataset, a certain number of validation instances drawn from the full dataset, and unlabeled instances, i.e., the remaining instances eliminating their labels. The statistics of those PU training sets are described in Table 2. ... Table 2: #Valid (e.g., F-MNIST-1: 500, Alzheimer: 1,279)
Hardware Specification	Yes	All experiments are performed on a server with one Nvidia RTX4090 GPU.
Software Dependencies	No	The paper mentions using 'Pytorch' and 'Scikit-Learn tool' but does not specify their version numbers, which is required for reproducible software dependencies.
Experiment Setup	Yes	We employ the stochastic gradient descent optimizer and select the learning rate from {0.001, 0.0015, 0.002, 0.0025, 0.003} and weight decay from {5e 5, 1e 4, 5e 4, 1e 3, 5e 3}. The probability boundary range is set to α {0.1, 0.3, 0.5, 0.7, 0.9}. The epoch numbers of the first and second stages of PUL-CPBF are all set to 25. The batch sizes of the first and second stages are set to 32 and 16, respectively. We also clamp the logits between 10 and 10 to avoid the potantial Na N error in Eqs.(11) and (13) following (Zhao et al., 2022).