Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

STRAP: Spatio-Temporal Pattern Retrieval for Out-of-Distribution Generalization

Authors: Haoyu Zhang, WentaoZhang, Hao Miao, Xinke Jiang, Yuchen Fang, Yifan Zhang

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across multiple real-world streaming graph datasets show that STRAP consistently outperforms state-of-the-art STGNN baselines on STOOD tasks, demonstrating its robustness, adaptability, and strong generalization capability without task-specific fine-tuning.
Researcher Affiliation	Collaboration	Haoyu Zhang , Wentao Zhang , Hao Miao , Xinke Jiang , Yuchen Fang , Yifan Zhang City University of Hong Kong, Hong Kong, China City University of Hong Kong (Dongguan), Guangdong, China Northeastern University, Shenyang, China The Hong Kong Polytechnic University, Hong Kong, China University of Electronic Science and Technology of China, Chengdu, China SLAI, Shenzhen, China EMAIL EMAIL EMAIL EMAIL EMAIL EMAIL
Pseudocode	Yes	For enhanced clarity, the Spatio-Temporal Pattern Library Construction is outlined in Algorithm 1 (cf. Appendix B.2) and the Training and Inference with Toy Graphs Retrieval are detailed in Algorithm 2 (cf. Appendix B.2).
Open Source Code	Yes	Code is anonymously available at https://anonymous.4open.science/r/STRAP/.
Open Datasets	Yes	We evaluate STRAP on three real-world streaming spatio-temporal graph datasets: AIR-Stream [9], PEMS-Stream [10], and ENERGY-Stream [9]. Detailed dataset statistics, experimental settings and evaluation are provided in Table 8 in Appendix C.3 and C.1 in Appendix C. ... Our experiments use real-world natural streaming datasets, and detailed statistics for each dataset are shown in Table 8 below: ... All datasets were used in accordance with their usage terms and conditions.
Dataset Splits	Yes	We establish a data split ratio of training/validation/test = 6/2/2 for all experiments.
Hardware Specification	Yes	All experiments were performed using NVIDIA A100 80G GPUs to ensure consistency in computational resources and reproducibility.
Software Dependencies	No	The paper mentions various backbone architectures (STGNN, ASTGNN, DCRNN, TGCN) and baselines, but does not list specific versions of software libraries or programming languages used for implementation, such as PyTorch or TensorFlow versions, or Python versions.
Experiment Setup	Yes	For fair comparisons, we set the learning rate to either 0.03 or 0.01 based on the specific situation of each dataset and model requirements (Table 7). The parameters of baselines were set based on their original papers and any accompanying code. We utilized either their default parameters or the best-reported parameters from their reported publications. ... Table 7: Hyperparameter Settings for All Methods Across Different Datasets