Understanding and Mitigating Data Contamination in Deep Anomaly Detection: A Kernel-based Approach

Authors: Shuang Wu, Jingyu Zhao, Guangjian Tian

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments on public datasets show that our approach significantly improves anomaly detection in the presence of contamination and outperforms some recently proposed detectors. Empirically, we conduct empirical studies on synthetic data, and publicly available datasets to validate our theoretic and methodological contributions.
Researcher Affiliation Collaboration 1Huawei Noah s Ark Lab 2The University of Hong Kong wushuang.noah@huawei.com, gladsy17@connect.hku.hk, Tian.Guangjian@huawei.com
Pseudocode Yes Algorithm 1 Contradicting training for one mini-batch
Open Source Code No The paper does not provide an explicit statement or link for the open-sourcing of the methodology's code.
Open Datasets Yes We use two public datasets the ECG 5000 electrocardiogram dataset3 and the HAR Human Activity Recognition dataset4 to evaluate anomaly detection on timeseries data. ... We use the MNIST dataset5 and the Fashion-MNIST6 dataset to evaluate anomaly detection on image data. ... 3https://www.cs.ucr.edu/ eamonn/time series data 2018/ 4https://archive.ics.uci.edu/ml/datasets/human+activity+ recognition+using+smartphones 5http://yann.lecun.com/exdb/mnist/ 6https://github.com/zalandoresearch/fashion-mnist
Dataset Splits No Table 1 summarizes the training/test data split for each dataset. (The table provides train/test splits, but no explicit validation split is mentioned with quantities or percentages.)
Hardware Specification No The paper does not specify any particular hardware components (e.g., GPU models, CPU types) used for conducting the experiments.
Software Dependencies No The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes The performance of our approach depends on the hyperparameter λ and the number of labeled anomalies. ... We evaluate the AUPRCs by varying λ and the number of additionally labeled anomalies. Figure 2 shows the mean and standard deviations (indicated in the height of the error bar) of the AUPRCs. Figure 1 shows 'ours, λ = 20' and 'ramp, λ = 1', explicitly stating hyperparameter values used in experiments.