Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Class Prior Estimation in Active Positive and Unlabeled Learning
Authors: Lorenzo Perini, Vincent Vercruyssen, Jesse Davis
IJCAI 2020 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that our approach accurately recovers the true class prior on a benchmark of anomaly detection datasets and that it does so more accurately than existing methods. and 5 Experiments We empirically evaluate the effectiveness of CAPE to recover the true class prior in the context of anomaly detection because it matches our setting: a handful of normal (positive) labels are acquired through an active learning strategy, the remaining examples are unlabeled. |
| Researcher Affiliation | Academia | Lorenzo Perini , Vincent Vercruyssen and Jesse Davis DTAI Research group, KU Leuven, Belgium EMAIL, EMAIL, EMAIL |
| Pseudocode | No | The paper describes methods using prose and mathematical equations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 5Code: https://github.com/Lorenzo-Perini/Active PU Learning |
| Open Datasets | Yes | Data. The benchmark consists of 9 standard anomaly detection datasets from [Campos et al., 2016]. The datasets are listed in Table 1. They contain more normals than anomalies with normal class priors varying between 0.64 and 0.99. 6Data: www.dbs.ifi.lmu.de/research/outlier-evaluation |
| Dataset Splits | Yes | First, the dataset is split into training and test sets using a stratified 5-fold split. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU models, CPU types, or memory used for running the experiments. |
| Software Dependencies | No | SSDO with its default parameters is used as the semi-supervised anomaly detector [Vercruyssen et al., 2018]. ... We use ISOLATION FOREST [Liu et al., 2008] as its unsupervised prior. ... We use uncertainty sampling as active learning strategy [Settles, 2012]. We model the user s uncertainty using the the kernel density estimate as implemented in SCIKIT-LEARN. |
| Experiment Setup | Yes | SSDO with its default parameters is used as the semi-supervised anomaly detector [Vercruyssen et al., 2018]. and The parameters of TICE, KM1, and KM2 are set to the values recommended in the original papers. and CAPE has only one hyperparameter: the range of cardinalities m in the outer loop, which is minimally 1 and maximally n (the cardinality of the dataset). In the experiments, we set the range to n {0.02, 0.04, 0.06, . . . , 0.4, 0.5, . . . 0.9}. and The process stops when 150 examples are labeled. |