reproducibilityindex.ai

Understanding and Mitigating Data Contamination in Deep Anomaly Detection: A Kernel-based Approach

Authors: Shuang Wu, Jingyu Zhao, Guangjian Tian

IJCAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on public datasets show that our approach signiﬁcantly improves anomaly detection in the presence of contamination and outperforms some recently proposed detectors. Empirically, we conduct empirical studies on synthetic data, and publicly available datasets to validate our theoretic and methodological contributions.
Researcher Affiliation	Collaboration	1Huawei Noah s Ark Lab 2The University of Hong Kong wushuang.noah@huawei.com, gladsy17@connect.hku.hk, Tian.Guangjian@huawei.com
Pseudocode	Yes	Algorithm 1 Contradicting training for one mini-batch
Open Source Code	No	The paper does not provide an explicit statement or link for the open-sourcing of the methodology's code.
Open Datasets	Yes	We use two public datasets the ECG 5000 electrocardiogram dataset3 and the HAR Human Activity Recognition dataset4 to evaluate anomaly detection on timeseries data. ... We use the MNIST dataset5 and the Fashion-MNIST6 dataset to evaluate anomaly detection on image data. ... 3https://www.cs.ucr.edu/ eamonn/time series data 2018/ 4https://archive.ics.uci.edu/ml/datasets/human+activity+ recognition+using+smartphones 5http://yann.lecun.com/exdb/mnist/ 6https://github.com/zalandoresearch/fashion-mnist
Dataset Splits	No	Table 1 summarizes the training/test data split for each dataset. (The table provides train/test splits, but no explicit validation split is mentioned with quantities or percentages.)
Hardware Specification	No	The paper does not specify any particular hardware components (e.g., GPU models, CPU types) used for conducting the experiments.
Software Dependencies	No	The paper does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	The performance of our approach depends on the hyperparameter λ and the number of labeled anomalies. ... We evaluate the AUPRCs by varying λ and the number of additionally labeled anomalies. Figure 2 shows the mean and standard deviations (indicated in the height of the error bar) of the AUPRCs. Figure 1 shows 'ours, λ = 20' and 'ramp, λ = 1', explicitly stating hyperparameter values used in experiments.