Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Structured Temporal Causality for Interpretable Multivariate Time Series Anomaly Detection

Authors: Dongchan Cho, Jiho Han, Keumyeong Kang, Minsang Kim, Honggyu Ryu, Namsoon Jung

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Oracle AD achieves state-of-the-art results across multiple real-world datasets and evaluation protocols, while remaining interpretable through SLS. Experimental results show consistent improvements in detection accuracy, anomaly localization, and robustness under diverse evaluation protocols. We evaluate Oracle AD on three widely adopted benchmark datasets: SMD [37], PSM [1], and SWa T [28]. We conduct an ablation study to evaluate the contributions of reconstruction loss and anomaly scoring strategies.
Researcher Affiliation	Industry	Industrial AI Lab, Sim Platform Co. Ltd. Affiliate Research Institute EMAIL
Pseudocode	No	The paper describes the methodology in prose and through diagrams (Figure 1), but does not contain a dedicated 'Pseudocode' or 'Algorithm' block with structured, step-by-step instructions.
Open Source Code	No	While our code is not yet released due to internal procedures, we plan to release the official implementation to support reproducibility.
Open Datasets	Yes	We use three widely adopted, publicly available MTSAD datasets. Each dataset is accessible for academic research, subject to the terms listed below: PSM (Pooled Server Metrics) [1]... available at https://github.com/e Bay/RANSyn Coders. SMD (Server Machine Dataset) [37]... hosted at https://github.com/Net Man AIOps/Omni Anomaly under the MIT License. SWa T (Secure Water Treatment) [28]... requires a formal access request via https://itrust.sutd.edu.sg/.
Dataset Splits	Yes	Table S1: Statistics of the benchmark datasets used for multivariate time-series anomaly detection. Dataset: PSM, Train: 132,481, Test: 87,841. SMD: first 5 days containing only normal behavior (training set) and the last 5 days including injected anomalies (testing set).
Hardware Specification	Yes	All experiments were conducted on a single workstation equipped with an NVIDIA Ge Force RTX 5090 GPU (32GB) and an Intel Core Ultra 7 265K CPU (20 cores). The system had 96GB of DDR5 RAM and ran Ubuntu 24.04 LTS (64-bit).
Software Dependencies	Yes	The software environment included Python 3.12, Py Torch 2.7.0, and CUDA Toolkit 12.8.
Experiment Setup	Yes	We trained all models using the Adam W [26] optimizer with default hyperparameters, as it provided stable convergence across datasets. ... we set the learning rate to 5e 5 for PSM, while a higher value of 5e 4 was used for both SMD and SWa T. ... The default configuration used across all datasets fixes the window length to 10, the batch size to 1024, the deviation weight to 3, and the number of layers to 2.