Learning from Positive and Unlabeled Data with Arbitrary Positive Shift

Authors: Zayd Hammoudeh, Daniel Lowd

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results demonstrate our methods effectiveness across numerous real-world datasets and forms of positive bias, including disjoint positive class-conditional supports. Additionally, we propose a general, simplified approach to address PU risk estimation overfitting.
Researcher Affiliation Academia Zayd Hammoudeh Daniel Lowd Department of Computer & Information Science University of Oregon Eugene, OR, USA {zayd, lowd}@cs.uoregon.edu
Pseudocode Yes Algorithm 1 Two-step unlabeled-unlabeled a PU learning Input: Labeled-positive set Xp and unlabeled sets Xtr-u, Xte-u Output: g s model parameters θ
Open Source Code Yes Our implementation is publicly available at: https://github.com/Zayd H/arbitrary_pu.
Open Datasets Yes We empirically studied the effectiveness of our methods PURR, PU2w UU, and PU2a PNU using synthetic and real-world data.3 Limited space allows us to discuss only two experiment sets here. Suppl. Section E details experiments on: synthetic data, 10 LIBSVM datasets [30] under a totally different positive-bias condition, and a study of our methods robustness to negative-class shift. ... Datasets Section 7.2 considers the MNIST [31], CIFAR10 [32], and 20 Newsgroups [33] datasets with binary classes formed by partitioning each dataset s labels. Section 7.3 uses two different TREC [34] spam email datasets to demonstrate our methods performance under real-world adversarial concept drift.
Dataset Splits Yes All methods saw identical training/test data splits and where applicable used the same initial weights. ... Table 1: Mean inductive misclassification rate (%) over 100 trials... Each dataset has four experimental conditions... (1) Ptrain = Ptest... (2 & 3 resp.) partially disjoint positive supports without and with prior shift, and (4) disjoint positive class definitions. ... Figure 2: Mean inductive misclassification rate over 100 trials...
Hardware Specification No The paper mentions 'University of Oregon high performance computer, Talapas' but does not specify any particular CPU models, GPU models, or other detailed hardware specifications for the experimental setup.
Software Dependencies No The paper mentions software like 'Adam W [35] with AMSGrad [36]', 'Dense Net-121 [39]', 'ELMo [37]', and 'Py Torch [43]', but it does not provide specific version numbers for these software components, which is necessary for reproducibility.
Experiment Setup Yes Our only individually tuned hyperparameters are learning rate and weight decay. We assume the worst case of no a priori knowledge about the positive shift so midpoint value ρ = 0.5 was used. ... Probabilistic classifier, ˆσ, used our abs-PU risk estimator with logistic loss. All other learners used sigmoid loss for ℓ. ... stochastic optimization (i.e., Adam W [35] with AMSGrad [36]). ... NNs to at most three fully-connected layers of 300 neurons.