Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Towards Stability of Autoregressive Neural Operators

Authors: Michael McCabe, Peter Harrington, Shashank Subramanian, Jed Brown

TMLR 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present results on several scientific systems that include Navier-Stokes fluid flow, rotating shallow water, and a high-resolution global weather forecasting system. We demonstrate that applying our design principles to neural operators leads to significantly lower errors for long-term forecasts as well as longer time horizons without qualitative signs of divergence compared to the original models for these systems.
Researcher Affiliation	Academia	Michael Mc Cabe EMAIL Department of Computer Science University of Colorado Boulder Peter Harrington EMAIL Lawrence Berkeley National Laboratory Shashank Subramanian EMAIL Lawrence Berkeley National Laboratory Jed Brown EMAIL Department of Computer Science University of Colorado Boulder
Pseudocode	Yes	def find_center ( t ) : time_of_day = t / day time_of_year = t / year max_declination = .4 # Truncated from estimate of earth s solar decline lon_center = time_of_day 2 np . pi # Rescale sin to 0 1 then s c a l e to np . pi lat_center = np . sin ( time_of_year 2 np . pi ) max_declination lon_anti = np . pi + lon_center #2 np . ( ( np . sin( time_of_day 2 np . pi )+1) / 2) pi return lon_center , lat_center , lon_anti , lat_center def season_day_forcing ( phi , theta , t , h_f0 ) : phi_c , theta_c , phi_a , theta_a = find_center ( t ) sigma = np . pi /2 c o e f f i c i e n t s = np . cos ( phi phi_c ) np . exp( ( theta theta_c ) 2 / sigma 2) f o r c i n g = h_f0 c o e f f i c i e n t s return f o r c i n g
Open Source Code	Yes	We open-source our code for reproducibility.
Open Datasets	Yes	We use the public dataset ERA5 (Hersbach et al., 2020), provided by ECMWF (European Center for Medium-Range Weather Forecasting), which consists of hourly predictions of several crucial atmospheric variables (such as wind velocities and geopotential heights) at a spatial resolution of 0.25 (that corresponds to a 720 1440 lat-lon grid; or a 25 km spatial resolution) from years 1979 to present day.
Dataset Splits	Yes	We use 25 initial conditions for training data and 3 and 2 initial conditions for validation and test data, respectively. Full simulation settings can be found in Appendix C.2. In total, the training set consisted of 25 trajectories of length 3024 for a total of 75,600 samples. Validation and test used an additional 2 and 3 trajectories respectively.
Hardware Specification	Yes	All models were trained for four hours or 40 epochs on 4x NVIDIA A100 GPUs. our revised model training for 16 hours across 16 nodes with 4x NVIDIA A100 GPUs per node.
Software Dependencies	No	The paper mentions software like "Py Torch (Paszke et al., 2019)", "TS2Kit (Mitchel et al., 2022)", "Nvidia Apex implementation of LAMB (You et al., 2020)", and "Adam Kingma & Ba (2017)" but does not specify explicit version numbers for these software packages or libraries.
Experiment Setup	Yes	All models were trained using identical settings. To minimize the impact of hyperparameter tuning, we used the automated learning rate search provided by Defazio & Mishchenko (2023) code for Adan (Xie et al., 2023). During each training step, one input snapshot was provided to the model for the purposes of predicting the output at t = t0 + t. These were optimized using mean-squared error rescaled by the mean norm of the dataset to reduce the impact of precision issues for 20 epochs at batch size of 128 per run. The Re DFNO utilized four blocks structured as described in Figure 3. Each block is defined with embedding dimension 192. The model was trained using the Nvidia Apex implementation of LAMB (You et al., 2020) following the schedule of Four Cast Net. We found that the gradient rescaling in LAMB acted similarly to a trust region and avoided training instability we experienced with Adam in this case. However, we were able to obtain similar partial training performance using a version of Adam with added step size constraints, but chose to stick with the known optimizer for this work.