reproducibilityindex.ai

Fast Nonparametric Estimation of Class Proportions in the Positive-Unlabeled Classification Setting

Authors: Daniel Zeiberg, Shantanu Jain, Predrag Radivojac6729-6736

AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our distance-based algorithm is evaluated on forty datasets and compared to all currently available methods. We provide evidence that this new approach results in the most accurate performance and can be readily used on large datasets.
Researcher Affiliation	Academia	Daniel Zeiberg, Shantanu Jain, Predrag Radivojac Khoury College of Computer Sciences Northeastern University, Boston, MA, U.S.A.
Pseudocode	Yes	Algorithm 1 Dist Curve algorithm for class prior estimation.
Open Source Code	Yes	Code Availability: github.ccs.neu.edu/dzeiberg/Class Prior Estimation.
Open Datasets	Yes	Most datasets were downloaded from the UCI Machine Learning Repository (Dua and Graff 2017)
Dataset Splits	Yes	Early stopping was used in the training process, monitoring the loss on 200,000 synthetic instances held out as the validation set.
Hardware Specification	No	The paper states 'All algorithms were run on identical datasets and computers with similar hardware,' but does not provide any specific details about the CPU, GPU, or other hardware components used for the experiments.
Software Dependencies	No	The paper mentions several algorithms and frameworks (e.g., 'linear SVMs', 'neural network ensembles'), but does not provide specific version numbers for any software dependencies, programming languages, or libraries used in the experiments.
Experiment Setup	Yes	The network was trained as a regression model. It contained 100 input nodes, three hidden layers, with sizes 2048, 1024, and 512, respectively, with each layer followed by a rectiﬁed linear unit activation layer, batch normalization layer and dropout layer with probability 0.5. The output was constrained to the range [0, 1]. The model was trained for 100 epochs using batch size 32, minimizing the mean absolute error (MAE) on the class prior prediction. Early stopping was used in the training process, monitoring the loss on 200,000 synthetic instances held out as the validation set.