reproducibilityindex.ai

Positive unlabeled learning via wrapper-based adaptive sampling

Authors: Pengyi Yang, Wei Liu, Jean Yang

IJCAI 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our empirical studies suggest that Ada Sampling requires very few iterations to accurately distinguish unlabeled pos-itive and negative instances even with very high positive to negative instance ratio in unlabeled data. We next compared Ada Sampling based single and ensemble models with the state-of-the-art bias-based approach and bootstrap sampling approach using Support Vector Machine (SVM) and k Nearest Neighbours (k NN) and a panel of evaluation metrics on several real-world datasets with different ratios of unlabeled positive instances. Our experimental results demonstrate that Ada Sampling signiﬁcantly improve on classiﬁcation for both SVM and k NN, and their performance compared favourably to state-of-the-art methods.
Researcher Affiliation	Academia	1Charles Perkins Centre, School of Mathematics and Statistics, University of Sydney, Australia 2Advanced Analytics Institute, University of Technology Sydney, Australia
Pseudocode	Yes	Algorithm 1: Ada Sampling for single model; Algorithm 2: Ada Sampling for ensemble of models
Open Source Code	Yes	All the data and code are available from the project repository1. 1https://github.com/Pengyi Yang/Ada Sampling
Open Datasets	Yes	All these datasets were obtained from UC Irvine Machine Learning Repository [Lichman, 2013]
Dataset Splits	Yes	We used a multi-layered repetitive 5-fold cross-validation (CV) procedure to evaluate the performance of each method. Speciﬁcally, label information of instances from the positive class were randomly removed. This is repeated 5 times each with a different set of selected instances and comprise the ﬁrst layer of randomisation. Subsequently, the data is split for 5-fold CV and this is repeated 10 times each with a different split point.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments (e.g., CPU or GPU models, memory, or cloud instance types).
Software Dependencies	No	The paper mentions using Support Vector Machine (SVM) and k-nearest neighbour (k NN) classification algorithms and specifies some parameters like 'An SVM with radial basis function kernel (C=1) and a k NN with k=3'. However, it does not provide specific version numbers for any software packages or libraries used (e.g., Python version, scikit-learn version, specific SVM library version).
Experiment Setup	Yes	An SVM with radial basis function kernel (C=1) and a k NN with k=3 were used across all positive unlabeled methods as well as the baseline... We set ε to be 0.01, requiring smaller than 1% change in mean prediction probabilities of all instances for the process to terminate.