Positive-Unlabeled Learning with Non-Negative Risk Estimator

Authors: Ryuichi Kiryo, Gang Niu, Marthinus C. du Plessis, Masashi Sugiyama

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments demonstrate that our risk estimator fixes the overfitting problem of its unbiased counterparts. In this section, we compare PN, unbiased PU (u PU) and non-negative PU (nn PU) learning experimentally. The experimental results are reported in Figure 2, where means and standard deviations of training and test risks based on the same 10 random samplings are shown.
Researcher Affiliation Academia Ryuichi Kiryo1,2 Gang Niu1,2 Marthinus C. du Plessis Masashi Sugiyama2,1 1The University of Tokyo, 7-3-1 Hongo, Tokyo 113-0033, Japan 2RIKEN, 1-4-1 Nihonbashi, Tokyo 103-0027, Japan { kiryo@ms., gang@ms., sugi@ }k.u-tokyo.ac.jp
Pseudocode Yes Algorithm 1 Large-scale PU learning based on stochastic optimization
Open Source Code Yes All the experiments were done with Chainer [45], and our implementation based on it is available at https://github.com/kiryor/nnPUlearning.
Open Datasets Yes Table 2: Specification of benchmark datasets, models, and optimition algorithms. Name # Train # Test # Feature πp Model g(x; θ) Opt. alg. A MNIST [29] 60, 000 10, 000 784 0.49 6-layer MLP with Re LU Adam [20] epsilon [37] 400, 000 100, 000 2, 000 0.50 6-layer MLP with Softsign Adam [20] 20News [38] 11, 314 7, 532 61, 188 0.44 5-layer MLP with Softsign Ada Grad [31] CIFAR-10 [39] 50, 000 10, 000 3, 072 0.40 13-layer CNN with Re LU Adam [20] See http://yann.lecun.com/exdb/mnist/ for MNIST, https://www.csie.ntu.edu.tw/~cjlin/ libsvmtools/datasets/binary.html for epsilon, http://qwone.com/~jason/20Newsgroups/ for 20Newsgroups, and https://www.cs.toronto.edu/~kriz/cifar.html for CIFAR-10.
Dataset Splits No For PN, np = 1, 000 and nn = (πn/2πp)2np; (B) for u PU, np = 1, 000 and nu is the total number of training data; (C) for nn PU, np and nu are exactly same as u PU.
Hardware Specification No The paper states "All the experiments were done with Chainer [45]" but does not provide specific hardware details like GPU/CPU models, memory, or cloud instance types used for running the experiments.
Software Dependencies No All the experiments were done with Chainer [45]. This mentions software but does not specify a version number.
Experiment Setup Yes The model for MNIST was a 6-layer multilayer perceptron (MLP) with Re LU [40] (more specifically, d-300-300-300-300-1). For epsilon, the model was similar while the activation was replaced with Softsign [41] for better performance... Furthermore, the sigmoid loss ℓsig was used as the surrogate loss and an ℓ2-regularization was also added. The resulting objectives were minimized by Adam [20] on MNIST, epsilon and CIFAR-10, and by Ada Grad [31] on 20News; we fixed β = 0 and γ = 1 for simplicity.