Fast Nonparametric Estimation of Class Proportions in the Positive-Unlabeled Classification Setting

Authors: Daniel Zeiberg, Shantanu Jain, Predrag Radivojac6729-6736

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our distance-based algorithm is evaluated on forty datasets and compared to all currently available methods. We provide evidence that this new approach results in the most accurate performance and can be readily used on large datasets.
Researcher Affiliation Academia Daniel Zeiberg, Shantanu Jain, Predrag Radivojac Khoury College of Computer Sciences Northeastern University, Boston, MA, U.S.A.
Pseudocode Yes Algorithm 1 Dist Curve algorithm for class prior estimation.
Open Source Code Yes Code Availability: github.ccs.neu.edu/dzeiberg/Class Prior Estimation.
Open Datasets Yes Most datasets were downloaded from the UCI Machine Learning Repository (Dua and Graff 2017)
Dataset Splits Yes Early stopping was used in the training process, monitoring the loss on 200,000 synthetic instances held out as the validation set.
Hardware Specification No The paper states 'All algorithms were run on identical datasets and computers with similar hardware,' but does not provide any specific details about the CPU, GPU, or other hardware components used for the experiments.
Software Dependencies No The paper mentions several algorithms and frameworks (e.g., 'linear SVMs', 'neural network ensembles'), but does not provide specific version numbers for any software dependencies, programming languages, or libraries used in the experiments.
Experiment Setup Yes The network was trained as a regression model. It contained 100 input nodes, three hidden layers, with sizes 2048, 1024, and 512, respectively, with each layer followed by a rectified linear unit activation layer, batch normalization layer and dropout layer with probability 0.5. The output was constrained to the range [0, 1]. The model was trained for 100 epochs using batch size 32, minimizing the mean absolute error (MAE) on the class prior prediction. Early stopping was used in the training process, monitoring the loss on 200,000 synthetic instances held out as the validation set.