Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Fast Nonparametric Estimation of Class Proportions in the Positive-Unlabeled Classification Setting
Authors: Daniel Zeiberg, Shantanu Jain, Predrag Radivojac6729-6736
AAAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our distance-based algorithm is evaluated on forty datasets and compared to all currently available methods. We provide evidence that this new approach results in the most accurate performance and can be readily used on large datasets. |
| Researcher Affiliation | Academia | Daniel Zeiberg, Shantanu Jain, Predrag Radivojac Khoury College of Computer Sciences Northeastern University, Boston, MA, U.S.A. |
| Pseudocode | Yes | Algorithm 1 Dist Curve algorithm for class prior estimation. |
| Open Source Code | Yes | Code Availability: github.ccs.neu.edu/dzeiberg/Class Prior Estimation. |
| Open Datasets | Yes | Most datasets were downloaded from the UCI Machine Learning Repository (Dua and Graff 2017) |
| Dataset Splits | Yes | Early stopping was used in the training process, monitoring the loss on 200,000 synthetic instances held out as the validation set. |
| Hardware Specification | No | The paper states 'All algorithms were run on identical datasets and computers with similar hardware,' but does not provide any specific details about the CPU, GPU, or other hardware components used for the experiments. |
| Software Dependencies | No | The paper mentions several algorithms and frameworks (e.g., 'linear SVMs', 'neural network ensembles'), but does not provide specific version numbers for any software dependencies, programming languages, or libraries used in the experiments. |
| Experiment Setup | Yes | The network was trained as a regression model. It contained 100 input nodes, three hidden layers, with sizes 2048, 1024, and 512, respectively, with each layer followed by a recti๏ฌed linear unit activation layer, batch normalization layer and dropout layer with probability 0.5. The output was constrained to the range [0, 1]. The model was trained for 100 epochs using batch size 32, minimizing the mean absolute error (MAE) on the class prior prediction. Early stopping was used in the training process, monitoring the loss on 200,000 synthetic instances held out as the validation set. |