Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

CrossAD: Time Series Anomaly Detection with Cross-scale Associations and Cross-window Modeling

Authors: Beibu Li, Qichao Shentu, Yang Shu, Hui Zhang, Ming Li, Ning Jin, Bin Yang, Chenjuan Guo

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments conducted on multiple real-world datasets using nine evaluation metrics validate the effectiveness of Cross AD, demonstrating state-of-the-art performance in anomaly detection.
Researcher Affiliation	Collaboration	1East China Normal University, 2Shandong Inspur Database Technology Co., Ltd.
Pseudocode	Yes	We present the procedures for updating the global multi-scale context in Algorithm 1. Algorithm 1 Global Multi-scale Context Update
Open Source Code	Yes	The code is made available at https://github.com/decisionintelligence/Cross AD.
Open Datasets	Yes	We evaluate our model on various datasets. Here is the description of these datasets: (1) SMD (Server Machine Dataset) captures resource utilization data from computer clusters belonging to an Internet company [43]. (2) MSL (Mars Science Laboratory Dataset), collected by NASA, includes telemetry data that reflects the operational status of sensors and actuators on the Martian rover [45]. (3) SMAP (Soil Moisture Active Passive Dataset), also gathered by NASA, provides soil moisture data obtained from spacecraft monitoring systems [45]. (4) SWaT (Secure Water Treatment) contains sensor data from a continuously operating water treatment infrastructure [46]. (5) PSM (Pooled Server Metrics Dataset) is sourced from eBay server machines, capturing metrics related to their performance [47]. (6) NeurIPS-TS (NeurIPS 2021 Time Series Benchmark) is a dataset introduced by [48], and we utilize the sub-datasets GECCO and SWAN, which encompass a variety of anomaly scenarios. For the MSL and SMAP datasets, only the first continuous dimension is retained [31, 32], as discrete variables inherently lack the smooth and structured latent space required for effective reconstruction. The statistical details about the datasets are available in Appendix B.
Dataset Splits	Yes	The statistical details about the datasets are available in Appendix B. Table 5: Statistics of the datasets. AR (anomaly ratio) represents the abnormal proportion of the whole dataset. Dataset Domain Dimension Window Training Validation Test (labeled) AR (%) MSL Spacecraft 1 96 46,653 11,664 73,729 10.5 PSM Server Machine 25 192 105,984 26,497 87,841 27.8 SMAP Spacecraft 1 192 108,146 27,037 427,617 12.8 SMD Server Machine 38 192 566,724 141,681 708,420 4.2 SWaT Water treatment 31 192 396,000 99,000 449,919 12.1 GECCO Water treatment 9 128 55,408 13,852 69,261 1.25 SWAN Space Weather 38 192 48,000 12,000 60,000 23.8 UCR Natural 1 1,790,680 447,670 6,143,541 0.6
Hardware Specification	Yes	We conduct all of our experiments using Pytorch with an NVIDIA Tesla-A800-80GB GPU.
Software Dependencies	No	We conduct all of our experiments using Pytorch with an NVIDIA Tesla-A800-80GB GPU.
Experiment Setup	Yes	In our experiments, we set the dimension of hidden states as 128, the dimension of attention as 32, the number of attention heads as 4, the dropout as 0.1, the encoder layer as 2, the decoder layer as 2, the number of sub-series queries as 5, and the size of global multi-scale context as 32. We run a sliding window to process the series and conduct the anomaly detection using non-overlapping windows. The various window sizes for different datasets can be seen in Table 5. The average pooling kernel sizes for multi-scale generation are selected from {32, 16, 8, 4, 2}. We use the Adam optimizer with an initial learning rate of 10 4 and set the batch size to 128.