Positive unlabeled learning via wrapper-based adaptive sampling

Authors: Pengyi Yang, Wei Liu, Jean Yang

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical studies suggest that Ada Sampling requires very few iterations to accurately distinguish unlabeled pos-itive and negative instances even with very high positive to negative instance ratio in unlabeled data. We next compared Ada Sampling based single and ensemble models with the state-of-the-art bias-based approach and bootstrap sampling approach using Support Vector Machine (SVM) and k Nearest Neighbours (k NN) and a panel of evaluation metrics on several real-world datasets with different ratios of unlabeled positive instances. Our experimental results demonstrate that Ada Sampling significantly improve on classification for both SVM and k NN, and their performance compared favourably to state-of-the-art methods.
Researcher Affiliation Academia 1Charles Perkins Centre, School of Mathematics and Statistics, University of Sydney, Australia 2Advanced Analytics Institute, University of Technology Sydney, Australia
Pseudocode Yes Algorithm 1: Ada Sampling for single model; Algorithm 2: Ada Sampling for ensemble of models
Open Source Code Yes All the data and code are available from the project repository1. 1https://github.com/Pengyi Yang/Ada Sampling
Open Datasets Yes All these datasets were obtained from UC Irvine Machine Learning Repository [Lichman, 2013]
Dataset Splits Yes We used a multi-layered repetitive 5-fold cross-validation (CV) procedure to evaluate the performance of each method. Specifically, label information of instances from the positive class were randomly removed. This is repeated 5 times each with a different set of selected instances and comprise the first layer of randomisation. Subsequently, the data is split for 5-fold CV and this is repeated 10 times each with a different split point.
Hardware Specification No The paper does not provide any specific details about the hardware used for running the experiments (e.g., CPU or GPU models, memory, or cloud instance types).
Software Dependencies No The paper mentions using Support Vector Machine (SVM) and k-nearest neighbour (k NN) classification algorithms and specifies some parameters like 'An SVM with radial basis function kernel (C=1) and a k NN with k=3'. However, it does not provide specific version numbers for any software packages or libraries used (e.g., Python version, scikit-learn version, specific SVM library version).
Experiment Setup Yes An SVM with radial basis function kernel (C=1) and a k NN with k=3 were used across all positive unlabeled methods as well as the baseline... We set ε to be 0.01, requiring smaller than 1% change in mean prediction probabilities of all instances for the process to terminate.