Fast Nonparametric Estimation of Class Proportions in the Positive-Unlabeled Classification Setting
Authors: Daniel Zeiberg, Shantanu Jain, Predrag Radivojac6729-6736
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our distance-based algorithm is evaluated on forty datasets and compared to all currently available methods. We provide evidence that this new approach results in the most accurate performance and can be readily used on large datasets. |
| Researcher Affiliation | Academia | Daniel Zeiberg, Shantanu Jain, Predrag Radivojac Khoury College of Computer Sciences Northeastern University, Boston, MA, U.S.A. |
| Pseudocode | Yes | Algorithm 1 Dist Curve algorithm for class prior estimation. |
| Open Source Code | Yes | Code Availability: github.ccs.neu.edu/dzeiberg/Class Prior Estimation. |
| Open Datasets | Yes | Most datasets were downloaded from the UCI Machine Learning Repository (Dua and Graff 2017) |
| Dataset Splits | Yes | Early stopping was used in the training process, monitoring the loss on 200,000 synthetic instances held out as the validation set. |
| Hardware Specification | No | The paper states 'All algorithms were run on identical datasets and computers with similar hardware,' but does not provide any specific details about the CPU, GPU, or other hardware components used for the experiments. |
| Software Dependencies | No | The paper mentions several algorithms and frameworks (e.g., 'linear SVMs', 'neural network ensembles'), but does not provide specific version numbers for any software dependencies, programming languages, or libraries used in the experiments. |
| Experiment Setup | Yes | The network was trained as a regression model. It contained 100 input nodes, three hidden layers, with sizes 2048, 1024, and 512, respectively, with each layer followed by a rectified linear unit activation layer, batch normalization layer and dropout layer with probability 0.5. The output was constrained to the range [0, 1]. The model was trained for 100 epochs using batch size 32, minimizing the mean absolute error (MAE) on the class prior prediction. Early stopping was used in the training process, monitoring the loss on 200,000 synthetic instances held out as the validation set. |