Anomaly Transformer: Time Series Anomaly Detection with Association Discrepancy

Authors: Jiehui Xu, Haixu Wu, Jianmin Wang, Mingsheng Long

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Anomaly Transformer achieves strong results on six benchmarks, covering three real applications.
Researcher Affiliation Academia Jiehui Xu , Haixu Wu , Jianmin Wang, Mingsheng Long B School of Software, BNRist, Tsinghua University, Beijing 100084, China {xjh20,whx20}@mails.tsinghua.edu.cn, {jimwang,mingsheng}@tsinghua.edu.cn
Pseudocode Yes We present the pseudo-code of Anomaly-Attention in Algorithm 1. Algorithm 2 Association Discrepancy Ass Dis(P, S; X) Calculation (multi-head version). Algorithm 3 Association-based Criterion Anomaly Score(X) Calculation
Open Source Code No The paper states "All the experiments are implemented in Pytorch (Paszke et al., 2019) with a single NVIDIA TITAN RTX 24GB GPU." but does not provide a direct link to the source code or an explicit statement about its public availability.
Open Datasets Yes Datasets Here is a description of the six experiment datasets: (1) SMD (Server Machine Dataset, Su et al. (2019)) is a 5-week-long dataset collected from a large Internet company with 38 dimensions. (2) PSM (Pooled Server Metrics, Abdulaal et al. (2021)) is collected internally from multiple application server nodes at e Bay with 26 dimensions. (3) Both MSL (Mars Science Laboratory rover) and SMAP (Soil Moisture Active Passive satellite) are public datasets from NASA (Hundman et al., 2018) with 55 and 25 dimensions respectively, which contain the telemetry anomaly data derived from the Incident Surprise Anomaly (ISA) reports of spacecraft monitoring systems. (4) SWa T (Secure Water Treatment, Mathur & Tippenhauer (2016)) is obtained from 51 sensors of the critical infrastructure system under continuous operations. (5) Neur IPS-TS (Neur IPS 2021 Time Series Benchmark) is a dataset proposed by Lai et al. (2021)...
Dataset Splits Yes The threshold δ is determined to make a proportion r of time points of the validation dataset labeled as anomalies. Experimentally, each dataset includes training, validation and testing subsets. Table 13: Details of benchmarks. AR represents the truth abnormal proportion of the whole dataset.
Hardware Specification Yes All the experiments are implemented in Pytorch (Paszke et al., 2019) with a single NVIDIA TITAN RTX 24GB GPU.
Software Dependencies No The paper mentions "Pytorch (Paszke et al., 2019)" but does not specify a version number for Pytorch or any other critical software dependencies.
Experiment Setup Yes The sliding window is with a fixed size of 100 for all datasets. ... Anomaly Transformer contains 3 layers. We set the channel number of hidden states dmodel as 512 and the number of heads h as 8. The hyperparameter λ (Equation 4) is set as 3 for all datasets to trade-off the loss terms. We use the ADAM (Kingma & Ba, 2015) optimizer with an initial learning rate of 10 4. The training process is early stopped within 10 epochs with the batch size of 32.