Positive and Unlabeled Learning with Controlled Probability Boundary Fence
Authors: Changchun Li, Yuanchao Dai, Lei Feng, Ximing Li, Bing Wang, Jihong Ouyang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive empirical results demonstrate that PUL-CPBF can achieve competitive performance compared with the existing PU learning baselines. We conduct extensive experiments to evaluate PUL-CPBF on benchmark datasets. |
| Researcher Affiliation | Academia | 1College of Computer Science and Technology, Jilin University, China 2Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education, Jilin University, China 3Information Systems Technology and Design Pillar, Singapore University of Technology and Design, Singapore. |
| Pseudocode | No | The paper describes the proposed method in prose but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks. |
| Open Source Code | No | The paper mentions implementing 'in-house code' for PUL-CPBF but does not provide any link or explicit statement about making their own code open-source or publicly available. It only provides links to other baseline methods' code. |
| Open Datasets | Yes | In the experiments, we employ 3 prevalent benchmark datasets, including Fashion MNIST (F-MNIST) (Xiao et al., 2017),1 CIFAR-10 (Krizhevsky, 2016),2 and STL-10 (Coates et al., 2011),3 and a real-world dataset on Alzheimer diagnosis (Alzheimer).4 ... F-MNIST: https://github.com/zalandoresearch/fashion-mnist ... CIFAR-10: http://www.cs.toronto.edu/kriz/cifar.html ... STL-10: https://cs.stanford.edu/acoates/stl10 ... Alzheimer: Dubey, S. Alzheimer s Dataset. Available online: https: //www.kaggle.com/tourist55/alzheimers-datas et-4-class-of-images |
| Dataset Splits | Yes | For the training set of each dataset, we form its PU version(s) composed of a few labeled positive instances drawn from its positive dataset, a certain number of validation instances drawn from the full dataset, and unlabeled instances, i.e., the remaining instances eliminating their labels. The statistics of those PU training sets are described in Table 2. ... Table 2: #Valid (e.g., F-MNIST-1: 500, Alzheimer: 1,279) |
| Hardware Specification | Yes | All experiments are performed on a server with one Nvidia RTX4090 GPU. |
| Software Dependencies | No | The paper mentions using 'Pytorch' and 'Scikit-Learn tool' but does not specify their version numbers, which is required for reproducible software dependencies. |
| Experiment Setup | Yes | We employ the stochastic gradient descent optimizer and select the learning rate from {0.001, 0.0015, 0.002, 0.0025, 0.003} and weight decay from {5e 5, 1e 4, 5e 4, 1e 3, 5e 3}. The probability boundary range is set to α {0.1, 0.3, 0.5, 0.7, 0.9}. The epoch numbers of the first and second stages of PUL-CPBF are all set to 25. The batch sizes of the first and second stages are set to 32 and 16, respectively. We also clamp the logits between 10 and 10 to avoid the potantial Na N error in Eqs.(11) and (13) following (Zhao et al., 2022). |