Margin Based PU Learning

Authors: Tieliang Gong, Guangtao Wang, Jieping Ye, Zongben Xu, Ming Lin

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on real-world datasets verify our theory and the state-of-the-art performance of the proposed PU learning algorithm.
Researcher Affiliation Academia Tieliang Gong,1 Guangtao Wang,2 Jieping Ye,2 Zongben Xu,1 Ming Lin2 1School of Mathematics and Statistics, Xi an Jiaotong University, Xi an 710049, Shaanxi, P. R. China 2Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, Michigan 48109, USA
Pseudocode Yes Algorithm 1 Positive Margin Based PU Learning (PMPU)
Open Source Code No The paper does not provide any specific links or explicit statements about the availability of open-source code for the described methodology.
Open Datasets Yes We evaluated the proposed PMPU on 6 real-world classification datasets, including WAVEFROM, COVERTYPE , MNIST, RCV-11, CIFAR-10, CIFAR-1002. These datasets cover a range of application domains such as text, hand digits and images. Table 1 summarizes the dataset statistical. The numbers of training of the 6 datasets vary from 2500 to 60000, the numbers of testing vary from 2500 to 565892 and the feature dimensions vary from 21 to 47236. The training set and testing set are prespecified for all datasets.
Dataset Splits Yes The regularization parameter C for each experiment is selected from the set {0} {10 6, 10 5, , 105, 106} by 5-fold cross validation.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models, memory, or cloud computing instance types used for running the experiments.
Software Dependencies No The paper mentions 'implemented by LIBSVM (Chang and Lin 2011)' but does not provide a specific version number for LIBSVM or any other software dependencies.
Experiment Setup Yes The regularization parameter C for each experiment is selected from the set {0} {10 6, 10 5, , 105, 106} by 5-fold cross validation. [...] In our experiments, the large positive margin oracle τ is determined according to the distribution of decision value generated by initial model. We set τ to be the 75% quantile of positive decision values predicted by the initial model for all datasets. We also set |XQ| = 3/4|XU|, where |XQ| denotes the number of re-sampled instances and |XU| the number of unlabeled instances. At the same time, the number of PU iterations is set to 30 for all experiments. [...] For RF and GBT, the number of trees is fixed to 50 for all experiments.