Mixture Proportion Estimation and PU Learning:A Modern Approach
Authors: Saurabh Garg, Yifan Wu, Alexander J. Smola, Sivaraman Balakrishnan, Zachary Lipton
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Both methods dominate previous approaches empirically, and for BBE, we establish formal guarantees that hold whenever we can train a model to cleanly separate out a small subset of positive examples. Our final algorithm (TED)n, alternates between the two procedures, significantly improving both our mixture proportion estimator and classifier. We conduct a battery of experiments both to empirically validate our claim that BBE s assumptions are mild and frequently hold in practice, and to establish the outperformance of BBE, CVIR, and (TED)n over the previous state of the art. We then conduct extensive experiments on semi-synthetic data, adapting a variety of binary classification datasets to the PU learning setup and demonstrating the superior performance of BBE and PU-learning with the CVIR objective. |
| Researcher Affiliation | Collaboration | Saurabh Garg1, Yifan Wu1, Alex Smola2, Sivaraman Balakrishnan1, Zachary C. Lipton1 1Carnegie Mellon University 2Amazon Web Services |
| Pseudocode | Yes | Algorithm 1 Best Bin Estimation (BBE) input : Validation positive (Xp) and unlabeled (Xu) samples. Blackbox model classifier pf : X Ñ r0, 1s. Hyperparameter 0 ă δ, γ ă 1. 1: Zp, Zu fp Xpq, fp Xuq. 2: pqupzq, pqppzq zi PZp Irziězs zi PZu Irziězs nu for all z P r0, 1s. 3: Estimate pc : arg minc Pr0,1s ˆ pqupcq pqppcq 1 γ output : pα : pquppcq |
| Open Source Code | Yes | Code is available at https://github.com/acmi-lab/PU_learning |
| Open Datasets | Yes | We simulate PU tasks on CIFAR-10 [24], MNIST [25], and IMDb sentiment analysis [32] datasets. We consider binarized versions of CIFAR-10 and MNIST. |
| Dataset Splits | Yes | For MPE, we use a held out PU validation set. Randomly split positive and unlabeled data into training X1 p, X1 u and hold-out set (X2 p, X2 u). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory, or cloud instance types) used for running the experiments. It mentions model architectures like ResNet and BERT but not the underlying hardware. |
| Software Dependencies | No | The paper mentions software like TensorFlow, PyTorch, Adam (optimizer), and BERT, but it does not specify the version numbers for these software components. For example, it cites the PyTorch paper but does not explicitly state the version used in the experiments. |
| Experiment Setup | No | The paper states: 'We did not tune hyperparameters or the optimization algorithm instead we use the same benchmarked hyperparameters and optimization algorithm for each dataset. For our method, we use cross-entropy loss. For u PU and nn PU, we use Adam [22] with sigmoid loss.' While it mentions the loss function and optimizer, it lacks specific numerical hyperparameter values (e.g., learning rate, batch size, number of epochs) or detailed training configurations required for reproducibility. |