reproducibilityindex.ai

Position: Quo Vadis, Unsupervised Time Series Anomaly Detection?

Authors: M. Saquib Sarfraz, Mei-Yen Chen, Lukas Layer, Kunyu Peng, Marios Koulakis

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our paper presents a critical analysis of the status quo in TAD, revealing the misleading track of current research and highlighting problematic methods, and evaluation practices. Our findings demonstrate the need for rigorous evaluation protocols, the creation of simple baselines, and the revelation that state-of-the-art deep anomaly detection models effectively learn linear mappings. These findings suggest the need for more exploration and development of simple and interpretable TAD methods. The increment of model complexity in the state-of-the-art deep-learning based models unfortunately offers very little improvement. We offer insights and suggestions for the field to move forward.
Researcher Affiliation	Collaboration	1Mercedes-Benz Tech Innovation, Ulm, Germany 2Karlsruhe Institute of Technology, Karlsruhe, Germany. Correspondence to: M. Saquib Sarfraz <saquibsarfraz@gmail.com>.
Pseudocode	No	The paper describes proposed simple baselines and neural network blocks verbally and with diagrams, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code1 is available on Git Hub to easily run the baselines and benchmarks. 1Code: https://github.com/ssarfraz/Quo Vadis TAD
Open Datasets	Yes	Overall, we used six commonly used benchmark datasets in our study. Here, we report the details (Table 1) and results from three multivariate datasets (SWa T, WADI, and SMD) and four univariate datasets (UCR/Internal Bleeding). (Guillame-Bert & Dubrawski, 2017), (Mathur & Tippenhauer, 2016), (Ahmed et al., 2017), (Su et al., 2019c).
Dataset Splits	Yes	We used a 90/10 split to make the train and the validation set. The validation set is only used for early stopping to avoid over-fitting and the Adam optimizer with learning rate 0.001 and a batch size of 512 were used.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions the use of 'Adam optimizer' but does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, scikit-learn versions) required for reproducibility.
Experiment Setup	Yes	In this section we summarize our data preprocessing steps and the hyperparameters used to train the models. The features were scaled to the interval [0, 1] in the training dataset and the learned scaling parameters were used to scale the testing dataset. For all of our NN baselines, when trained in forecasting mode, we used a time window of size 5. We used a 90/10 split to make the train and the validation set. ... the Adam optimizer with learning rate 0.001 and a batch size of 512 were used. PCA reconstruction error: ... first 30 principal components... 1-layer Linear MLP: A hidden layer of size 32 is used. Single block MLP-Mixer and Single Transformer block both use an embedding of 128 for the hidden layer. 1-layer GCN-LSTM block: The dimension for the GCN output nodes is set to 10 and for LSTM layer to 64 units.