reproducibilityindex.ai

Margin Based PU Learning

Authors: Tieliang Gong, Guangtao Wang, Jieping Ye, Zongben Xu, Ming Lin

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on real-world datasets verify our theory and the state-of-the-art performance of the proposed PU learning algorithm.
Researcher Affiliation	Academia	Tieliang Gong,1 Guangtao Wang,2 Jieping Ye,2 Zongben Xu,1 Ming Lin2 1School of Mathematics and Statistics, Xi an Jiaotong University, Xi an 710049, Shaanxi, P. R. China 2Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
Pseudocode	Yes	Algorithm 1 Positive Margin Based PU Learning (PMPU)
Open Source Code	No	The paper does not provide any specific links or explicit statements about the availability of open-source code for the described methodology.
Open Datasets	Yes	We evaluated the proposed PMPU on 6 real-world classiﬁcation datasets, including WAVEFROM, COVERTYPE , MNIST, RCV-11, CIFAR-10, CIFAR-1002. These datasets cover a range of application domains such as text, hand digits and images. Table 1 summarizes the dataset statistical. The numbers of training of the 6 datasets vary from 2500 to 60000, the numbers of testing vary from 2500 to 565892 and the feature dimensions vary from 21 to 47236. The training set and testing set are prespeciﬁed for all datasets.
Dataset Splits	Yes	The regularization parameter C for each experiment is selected from the set {0} {10 6, 10 5, , 105, 106} by 5-fold cross validation.
Hardware Specification	No	The paper does not provide any specific hardware details such as GPU or CPU models, memory, or cloud computing instance types used for running the experiments.
Software Dependencies	No	The paper mentions 'implemented by LIBSVM (Chang and Lin 2011)' but does not provide a specific version number for LIBSVM or any other software dependencies.
Experiment Setup	Yes	The regularization parameter C for each experiment is selected from the set {0} {10 6, 10 5, , 105, 106} by 5-fold cross validation. [...] In our experiments, the large positive margin oracle τ is determined according to the distribution of decision value generated by initial model. We set τ to be the 75% quantile of positive decision values predicted by the initial model for all datasets. We also set \|XQ\| = 3/4\|XU\|, where \|XQ\| denotes the number of re-sampled instances and \|XU\| the number of unlabeled instances. At the same time, the number of PU iterations is set to 30 for all experiments. [...] For RF and GBT, the number of trees is ﬁxed to 50 for all experiments.