Learning from Positive and Unlabeled Data with Arbitrary Positive Shift
Authors: Zayd Hammoudeh, Daniel Lowd
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results demonstrate our methods effectiveness across numerous real-world datasets and forms of positive bias, including disjoint positive class-conditional supports. Additionally, we propose a general, simplified approach to address PU risk estimation overfitting. |
| Researcher Affiliation | Academia | Zayd Hammoudeh Daniel Lowd Department of Computer & Information Science University of Oregon Eugene, OR, USA {zayd, lowd}@cs.uoregon.edu |
| Pseudocode | Yes | Algorithm 1 Two-step unlabeled-unlabeled a PU learning Input: Labeled-positive set Xp and unlabeled sets Xtr-u, Xte-u Output: g s model parameters θ |
| Open Source Code | Yes | Our implementation is publicly available at: https://github.com/Zayd H/arbitrary_pu. |
| Open Datasets | Yes | We empirically studied the effectiveness of our methods PURR, PU2w UU, and PU2a PNU using synthetic and real-world data.3 Limited space allows us to discuss only two experiment sets here. Suppl. Section E details experiments on: synthetic data, 10 LIBSVM datasets [30] under a totally different positive-bias condition, and a study of our methods robustness to negative-class shift. ... Datasets Section 7.2 considers the MNIST [31], CIFAR10 [32], and 20 Newsgroups [33] datasets with binary classes formed by partitioning each dataset s labels. Section 7.3 uses two different TREC [34] spam email datasets to demonstrate our methods performance under real-world adversarial concept drift. |
| Dataset Splits | Yes | All methods saw identical training/test data splits and where applicable used the same initial weights. ... Table 1: Mean inductive misclassification rate (%) over 100 trials... Each dataset has four experimental conditions... (1) Ptrain = Ptest... (2 & 3 resp.) partially disjoint positive supports without and with prior shift, and (4) disjoint positive class definitions. ... Figure 2: Mean inductive misclassification rate over 100 trials... |
| Hardware Specification | No | The paper mentions 'University of Oregon high performance computer, Talapas' but does not specify any particular CPU models, GPU models, or other detailed hardware specifications for the experimental setup. |
| Software Dependencies | No | The paper mentions software like 'Adam W [35] with AMSGrad [36]', 'Dense Net-121 [39]', 'ELMo [37]', and 'Py Torch [43]', but it does not provide specific version numbers for these software components, which is necessary for reproducibility. |
| Experiment Setup | Yes | Our only individually tuned hyperparameters are learning rate and weight decay. We assume the worst case of no a priori knowledge about the positive shift so midpoint value ρ = 0.5 was used. ... Probabilistic classifier, ˆσ, used our abs-PU risk estimator with logistic loss. All other learners used sigmoid loss for ℓ. ... stochastic optimization (i.e., Adam W [35] with AMSGrad [36]). ... NNs to at most three fully-connected layers of 300 neurons. |