Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

PhysDiff: A Physically-Guided Diffusion Model for Multivariate Time Series Anomaly Detection

Authors: Long Li, Wencheng Zhang, Shi Yuan, Hongle Guo, Wanghu Chen

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments on five benchmark datasets and two Neur IPS-TS scenarios demonstrate that Phys Diff outperforms 18 state-of-the-art baselines, with average F1 score improvements on both standard and challenging datasets. Experimental results validate the advantages of combining principled signal decomposition with diffusion-based reconstruction for robust, interpretable anomaly detection in complex dynamic systems.
Researcher Affiliation	Academia	Long Li1, , Wencheng Zhang1, , Shi Yuan1, , Hongle Guo2, Wanghu Chen1, 1College of Computer Science & Engineering, Northwest Normal University 2School of Management, Northwest Normal University
Pseudocode	Yes	Algorithm 1 Physically-Guided Diffusion Process
Open Source Code	Yes	Code is available at https://anonymous.4open.science/r/Phys Diff-4726.
Open Datasets	Yes	We evaluated our approach using five widely recognized benchmark datasets: SMD [19], MSL [7], SMAP [7], SWa T [20], and PSM [21], plus the Neur IPS-TS dataset comprising Creditcard and GECCO subsets as detailed by Lai et al. (2021) [1]. Data labeled as normal were partitioned with 80% allocated for training and 20% for validation, ensuring the model is properly optimized on typical behavior. These datasets represent diverse domains including spacecraft telemetry, water treatment systems, and financial transactions, providing a comprehensive evaluation landscape for anomaly detection methods.
Dataset Splits	Yes	Data labeled as normal were partitioned with 80% allocated for training and 20% for validation, ensuring the model is properly optimized on typical behavior. These datasets represent diverse domains including spacecraft telemetry, water treatment systems, and financial transactions, providing a comprehensive evaluation landscape for anomaly detection methods.
Hardware Specification	Yes	Implementation Environment Experiments were conducted using Py Torch 2.1.2 on NVIDIA GTX 2080Ti with 22GB memory.
Software Dependencies	Yes	Implementation Environment Experiments were conducted using Py Torch 2.1.2 on NVIDIA GTX 2080Ti with 22GB memory. Our implementation includes optimizations: (1) CUDA-accelerated MAFD calculations using nvmath when available, (2) efficient time series embeddings with convolutional layers of kernel size 1, (3) optimized batch matrix multiplications for routing attention, and (4) reconstruction head with two-layer MLP, GELU activation, and dropout rate 0.2.
Experiment Setup	Yes	In our experiments, we implement Phys Diff with careful attention to model architecture, training procedure, and anomaly detection strategies, with all key hyperparameters summarized in Table 7.