Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Rethink GraphODE Generalization within Coupled Dynamical System

Authors: Guancheng Wan, Zijie Huang, Wanjia Zhao, Xiao Luo, Yizhou Sun, Wei Wang

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Extensive experiments across diverse dynamical systems demonstrate that ours outperforms state-of-the-art methods within both in-distribution and out-of-distribution.
Researcher Affiliation	Academia	1University of California, Los Angeles 2Stanford University. Correspondence to: Guancheng Wan <EMAIL>.
Pseudocode	No	The paper describes the methodology using text and mathematical equations, but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	The code is available at https://github. com/Guancheng Wan/GREAT.
Open Datasets	No	We evaluate GREAT using three coupled dynamical systems datasets: SPRING, CHARGED, and PENDULUM, which model the dynamics of physical systems with complex interdependencies. For evaluation, we adopt two metrics: RMSE (Root Mean Square Error) and MAPE (Mean Absolute Percentage Error). We assess GREAT under both in-distribution (ID) and out-of-distribution (OOD) settings, where the OOD setting modifies the test dataset s initial conditions (e.g., velocity, position) to test the model s ability to generalize to unseen scenarios. Further details are provided in Appendix A. The datasets are generated using a physics-based simulation framework, where the system parameters are sampled from the specified ranges.
Dataset Splits	Yes	The datasets are split into training, validation, and test sets, with additional out-of-distribution (OOD) test sets to evaluate model generalization. Further details are provided in Appendix A. ... The training, validation, and test sets are split as follows: 5000 samples for training, 1000 samples for validation, 1000 samples for testing, and 1000 samples for OOD testing.
Hardware Specification	Yes	The experiments are conducted using NVIDIA Ge Force RTX 3090 GPUs as the hardware platform, coupled with Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz.
Software Dependencies	Yes	The deep learning framework employed was Pytorch, version 1.11.0, alongside CUDA version 11.3.
Experiment Setup	Yes	The hidden layer size was set to 32 for each dataset. For optimization, the Adam optimizer (Kingma & Ba, 2014) was chosen, with a learning rate of 1e 5 and a weight decay of 1e 3 during the training process.