Positive-Unlabeled Learning with Non-Negative Risk Estimator
Authors: Ryuichi Kiryo, Gang Niu, Marthinus C. du Plessis, Masashi Sugiyama
NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments demonstrate that our risk estimator fixes the overfitting problem of its unbiased counterparts. In this section, we compare PN, unbiased PU (u PU) and non-negative PU (nn PU) learning experimentally. The experimental results are reported in Figure 2, where means and standard deviations of training and test risks based on the same 10 random samplings are shown. |
| Researcher Affiliation | Academia | Ryuichi Kiryo1,2 Gang Niu1,2 Marthinus C. du Plessis Masashi Sugiyama2,1 1The University of Tokyo, 7-3-1 Hongo, Tokyo 113-0033, Japan 2RIKEN, 1-4-1 Nihonbashi, Tokyo 103-0027, Japan { kiryo@ms., gang@ms., sugi@ }k.u-tokyo.ac.jp |
| Pseudocode | Yes | Algorithm 1 Large-scale PU learning based on stochastic optimization |
| Open Source Code | Yes | All the experiments were done with Chainer [45], and our implementation based on it is available at https://github.com/kiryor/nnPUlearning. |
| Open Datasets | Yes | Table 2: Specification of benchmark datasets, models, and optimition algorithms. Name # Train # Test # Feature πp Model g(x; θ) Opt. alg. A MNIST [29] 60, 000 10, 000 784 0.49 6-layer MLP with Re LU Adam [20] epsilon [37] 400, 000 100, 000 2, 000 0.50 6-layer MLP with Softsign Adam [20] 20News [38] 11, 314 7, 532 61, 188 0.44 5-layer MLP with Softsign Ada Grad [31] CIFAR-10 [39] 50, 000 10, 000 3, 072 0.40 13-layer CNN with Re LU Adam [20] See http://yann.lecun.com/exdb/mnist/ for MNIST, https://www.csie.ntu.edu.tw/~cjlin/ libsvmtools/datasets/binary.html for epsilon, http://qwone.com/~jason/20Newsgroups/ for 20Newsgroups, and https://www.cs.toronto.edu/~kriz/cifar.html for CIFAR-10. |
| Dataset Splits | No | For PN, np = 1, 000 and nn = (πn/2πp)2np; (B) for u PU, np = 1, 000 and nu is the total number of training data; (C) for nn PU, np and nu are exactly same as u PU. |
| Hardware Specification | No | The paper states "All the experiments were done with Chainer [45]" but does not provide specific hardware details like GPU/CPU models, memory, or cloud instance types used for running the experiments. |
| Software Dependencies | No | All the experiments were done with Chainer [45]. This mentions software but does not specify a version number. |
| Experiment Setup | Yes | The model for MNIST was a 6-layer multilayer perceptron (MLP) with Re LU [40] (more specifically, d-300-300-300-300-1). For epsilon, the model was similar while the activation was replaced with Softsign [41] for better performance... Furthermore, the sigmoid loss ℓsig was used as the surrogate loss and an ℓ2-regularization was also added. The resulting objectives were minimized by Adam [20] on MNIST, epsilon and CIFAR-10, and by Ada Grad [31] on 20News; we fixed β = 0 and γ = 1 for simplicity. |