reproducibilityindex.ai

Estimating the Contamination Factor’s Distribution in Unsupervised Anomaly Detection

Authors: Lorenzo Perini, Paul-Christian Bürkner, Arto Klami

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically on 22 datasets, we show that the estimated distribution is well-calibrated and that setting the threshold using the posterior mean improves the detectors performance over several alternative methods.
Researcher Affiliation	Academia	1DTAI lab & Leuven.AI, Department of Computer Science, KU Leuven, Belgium 2Cluster of Excellence Sim Tech, University of Stuttgart, Germany 3Department of Computer Science, University of Helsinki, Finland.
Pseudocode	No	The paper describes the proposed method in detail across multiple subsections (3.1, 3.2, 3.3, 3.4) but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	Code and online Supplement are available at: https://github.com/Lorenzo-Perini/Gamma GMM
Open Datasets	Yes	We carry out our study on 20 commonly used benchmark datasets and additionally 2 (proprietary) real tasks. The benchmark datasets contain semantically useful anomalies widely used in the literature (Campos et al., 2016).
Dataset Splits	No	In the experiments we assume a transductive setting (Campos et al., 2016; Scott & Blanchard, 2008; Toron et al., 2022), where a dataset D is used both for training and testing.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, memory, or processor types used for running the experiments.
Software Dependencies	No	All these methods are implemented in the python library Py OD (Zhao et al., 2019b). The threshold estimators are implemented in PYTHRESH2 with default hyperparameters. Finally, the DPGMM is implemented in SKLEARN - no version numbers are provided for these libraries.
Experiment Setup	Yes	Our method introduces two new hyperparameters: p0 and phigh. We both of them set to 0.01 as default value because extremely high contamination, as well as no anomalies, are unlikely events. We use 10 anomaly detectors with different inductive biases (Soenen et al., 2021)... We set the means prior to 0, and the covariance matrices prior to identities of appropriate dimension.