Towards a Rigorous Evaluation of Time-Series Anomaly Detection
Authors: Siwon Kim, Kukjin Choi, Hyun-Soo Choi, Byunghan Lee, Sungroh Yoon7194-7201
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we theoretically and experimentally reveal that the PA protocol has a great possibility of overestimating the detection performance; even a random anomaly score can easily turn into a state-of-the-art TAD method. Therefore, the comparison of TAD methods after applying the PA protocol can lead to misguided rankings. Furthermore, we question the potential of existing TAD methods by showing that an untrained model obtains comparable detection performance to the existing methods even when PA is forbidden. Based on our findings, we propose a new baseline and an evaluation protocol. We expect that our study will help a rigorous evaluation of TAD and lead to further improvement in future researches. |
| Researcher Affiliation | Collaboration | 1 Data Science and AI Laboratory, Seoul National University, Korea 2 DIT Center, Samsung Electronics, Korea 3 Ziovision, Korea 4 Department of CSE and Education Research Team for Medical Big-data Convergence, Kangwon National University, Korea 5 Department of Electronic and IT Media Engineering, Seoul National University of Science and Technology, Korea 6 Department of ECE and Interdisciplinary Program in AI, Seoul National University, Korea 7 AIIS, ASRI, and INMC, Seoul National University, Korea |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide any explicit statements or links indicating that the source code for the methodology described in this paper is publicly available. |
| Open Datasets | Yes | In this section, we introduce a list of the five most widely used TAD benchmark datasets as follows: Secure water treatment (SWaT) (Goh et al. 2016): ... Water distribution testbed (WADI) (Ahmed, Palleti, and Mathur 2017): ... Server Machine Dataset (SMD) (Su et al. 2019): ... Mars Science Laboratory (MSL) and Soil Moisture Active Passive (SMAP) (Hundman et al. 2018): |
| Dataset Splits | No | The paper does not explicitly provide specific details about a dedicated validation dataset split, such as percentages or sample counts. While it mentions 'All thresholds were obtained from those that yielded the best score', implying some form of validation, the formal split details are not specified. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running its experiments. It only discusses the experimental setup at a higher level. |
| Software Dependencies | No | The paper mentions software components like 'LSTM' and 'autoencoder' and refers to 'Python' but does not specify version numbers for any key software libraries, frameworks, or solvers required for replication. |
| Experiment Setup | Yes | The parameters were fixed after being initialized from a Gaussian distribution N(0, 0.02). The window size τ for Case 2 and 3 was set to 120. For experiments that included randomness, such as Case 1 and 3, we repeated them with five different seeds and reported the average values. All thresholds were obtained from those that yielded the best score. |