Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Fortifying Time Series: DTW-Certified Robust Anomaly Detection

Authors: Shijie Liu, Tansu Alpcan, Christopher Leckie, Sarah Erfani

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across various datasets and models validate the effectiveness and practicality of our theoretical approach. Results demonstrate significantly improved performance, e.g., up to 18.7% in F1-score under DTW-based adversarial attacks compared to traditional certified models.
Researcher Affiliation	Academia	1Department of Electrical and Electronic Engineering University of Melbourne, Melbourne, Australia 2School of Computing and Information Systems University of Melbourne, Melbourne, Australia
Pseudocode	No	The paper describes the approach and its implementation but does not present a distinct pseudocode or algorithm block. Figure 2 illustrates the process but is not pseudocode.
Open Source Code	Yes	The code and environment file are provided in the supplemental material.
Open Datasets	Yes	Our empirical evaluation of the DTW-certified defense spans seven widely used benchmark datasets, including SMAP [48], MSL [27], SML [60], NIPS-TS-SWAN, NIPS-TS-CREDITCARD, NIPS-TS-WATER [30], UCR-1 ane UCR-2 [68], encompassing both univariate and multivariate time-series data.
Dataset Splits	Yes	Table 4: Statistics of the benchmark datasets for time-series anomaly detection. SMAP 25 135,183 427,617 13.13% MSL 55 58,317 73,729 10.72% SMD 25 708,405 708,420 4.16% NIPS-TS-SWAN 38 60,000 60,000 32.60% NIPS-TS-CREDITCARD 29 284,807 284,807 0.17% NIPS-TS-WATER 9 69,260 69,260 1.05% UCR-1 1 35,000 44,795 1.38% UCR-2 1 35,000 45,000 0.67%
Hardware Specification	Yes	All experiments are implemented using Py Torch and executed on a Linux server equipped with Intel(R) Xeon(R) Gold 6326 CPUs and NVIDIA A100 GPUs with 80 GB of memory.
Software Dependencies	No	All experiments are implemented using Py Torch and executed on a Linux server equipped with Intel(R) Xeon(R) Gold 6326 CPUs and NVIDIA A100 GPUs with 80 GB of memory.
Experiment Setup	Yes	We use the following default hyperparameters across all experiments unless otherwise specified: sequence length T = 50, DTW wrapping window size w = 4, number of noisy samples n = 1, 000, smoothing noise level σ = 0.5 in N(0, σ2I), and percentile p = 0.5 in the percentile-smoothed function hp.