Estimating the Contamination Factor’s Distribution in Unsupervised Anomaly Detection
Authors: Lorenzo Perini, Paul-Christian Bürkner, Arto Klami
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically on 22 datasets, we show that the estimated distribution is well-calibrated and that setting the threshold using the posterior mean improves the detectors performance over several alternative methods. |
| Researcher Affiliation | Academia | 1DTAI lab & Leuven.AI, Department of Computer Science, KU Leuven, Belgium 2Cluster of Excellence Sim Tech, University of Stuttgart, Germany 3Department of Computer Science, University of Helsinki, Finland. |
| Pseudocode | No | The paper describes the proposed method in detail across multiple subsections (3.1, 3.2, 3.3, 3.4) but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and online Supplement are available at: https://github.com/Lorenzo-Perini/Gamma GMM |
| Open Datasets | Yes | We carry out our study on 20 commonly used benchmark datasets and additionally 2 (proprietary) real tasks. The benchmark datasets contain semantically useful anomalies widely used in the literature (Campos et al., 2016). |
| Dataset Splits | No | In the experiments we assume a transductive setting (Campos et al., 2016; Scott & Blanchard, 2008; Toron et al., 2022), where a dataset D is used both for training and testing. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, memory, or processor types used for running the experiments. |
| Software Dependencies | No | All these methods are implemented in the python library Py OD (Zhao et al., 2019b). The threshold estimators are implemented in PYTHRESH2 with default hyperparameters. Finally, the DPGMM is implemented in SKLEARN - no version numbers are provided for these libraries. |
| Experiment Setup | Yes | Our method introduces two new hyperparameters: p0 and phigh. We both of them set to 0.01 as default value because extremely high contamination, as well as no anomalies, are unlikely events. We use 10 anomaly detectors with different inductive biases (Soenen et al., 2021)... We set the means prior to 0, and the covariance matrices prior to identities of appropriate dimension. |