Positive-Unlabeled Learning using Random Forests via Recursive Greedy Risk Minimization
Authors: Jonathan Wilton, Abigail Koay, Ryan Ko, Miao Xu, Nan Ye
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section we compare PU Extra Trees with several other PU learning methods. Datasets We consider a selection of common datasets for classification from LIBSVM [5], as well as MNIST digits [19], the intrusion detection dataset UNSW-NB15 [25] and CIFAR-10 [17] to demonstrate the versatility of our method. |
| Researcher Affiliation | Academia | 1School of Mathematics and Physics, The University of Queensland 2School of Information Technology and Electrical Engineering, The University of Queensland 3RIKEN, Japan |
| Pseudocode | Yes | Algorithm 1: Learn DT(κ, S) |
| Open Source Code | Yes | Our code is available at https://github.com/puetpaper/PUExtra Trees. |
| Open Datasets | Yes | Datasets We consider a selection of common datasets for classification from LIBSVM [5], as well as MNIST digits [19], the intrusion detection dataset UNSW-NB15 [25] and CIFAR-10 [17] to demonstrate the versatility of our method. |
| Dataset Splits | Yes | : random 80%-20% train-test split was used as no train-test splits were provided. |
| Hardware Specification | Yes | In particular, random forests were trained using 32GB RAM and one of Intel i7-10700, Intel i7-11700 or AMD Epyc 7702p CPU. Neural networks were trained on one of NVIDIA RTX A4000 or NVIDIA RTX A6000 GPU due to the lack of identical devices. |
| Software Dependencies | No | The paper mentions general software categories like "neural networks" and references to "scikit-learn" implicitly through a citation, but it does not provide specific version numbers for any libraries, frameworks, or programming languages used (e.g., Python, PyTorch, TensorFlow, Scikit-learn versions). |
| Experiment Setup | Yes | Following common practice [26, 13], the default hyperparameters for PU ET are: 100 trees, no explicit restriction on the maximum tree depth, sample F = d features out of a total of d features and sample T = 1 threshold value when computing an optimal split. The architectures for the neural networks used in u PU, nn PU and Self-PU were copied from [16] for the 20News, epsilon, MNIST and CIFAR-10 datasets. A 6 layer MLP with Re LU was used for MNIST, Covtype, Mushroom and UNSW-NB15... For each dataset the neural networks were trained for 200 epochs. The batch size, learning rate, use of batch-norm..., weight decay and choice of optimiser were tuned for each dataset. |