Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Positive-Unlabeled Learning using Random Forests via Recursive Greedy Risk Minimization
Authors: Jonathan Wilton, Abigail Koay, Ryan Ko, Miao Xu, Nan Ye
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we compare PU Extra Trees with several other PU learning methods. Datasets We consider a selection of common datasets for classification from LIBSVM [5], as well as MNIST digits [19], the intrusion detection dataset UNSW-NB15 [25] and CIFAR-10 [17] to demonstrate the versatility of our method. |
| Researcher Affiliation | Academia | 1School of Mathematics and Physics, The University of Queensland 2School of Information Technology and Electrical Engineering, The University of Queensland 3RIKEN, Japan |
| Pseudocode | Yes | Algorithm 1: Learn DT(Îș, S) |
| Open Source Code | Yes | Our code is available at https://github.com/puetpaper/PUExtra Trees. |
| Open Datasets | Yes | Datasets We consider a selection of common datasets for classification from LIBSVM [5], as well as MNIST digits [19], the intrusion detection dataset UNSW-NB15 [25] and CIFAR-10 [17] to demonstrate the versatility of our method. |
| Dataset Splits | Yes | : random 80%-20% train-test split was used as no train-test splits were provided. |
| Hardware Specification | Yes | In particular, random forests were trained using 32GB RAM and one of Intel i7-10700, Intel i7-11700 or AMD Epyc 7702p CPU. Neural networks were trained on one of NVIDIA RTX A4000 or NVIDIA RTX A6000 GPU due to the lack of identical devices. |
| Software Dependencies | No | The paper mentions general software categories like "neural networks" and references to "scikit-learn" implicitly through a citation, but it does not provide specific version numbers for any libraries, frameworks, or programming languages used (e.g., Python, PyTorch, TensorFlow, Scikit-learn versions). |
| Experiment Setup | Yes | Following common practice [26, 13], the default hyperparameters for PU ET are: 100 trees, no explicit restriction on the maximum tree depth, sample F = d features out of a total of d features and sample T = 1 threshold value when computing an optimal split. The architectures for the neural networks used in u PU, nn PU and Self-PU were copied from [16] for the 20News, epsilon, MNIST and CIFAR-10 datasets. A 6 layer MLP with Re LU was used for MNIST, Covtype, Mushroom and UNSW-NB15... For each dataset the neural networks were trained for 200 epochs. The batch size, learning rate, use of batch-norm..., weight decay and choice of optimiser were tuned for each dataset. |