Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

GyroSwin: 5D Surrogates for Gyrokinetic Plasma Turbulence Simulations

Authors: Fabian Paischer, Gianluca Galletti, William Hornsby, Paul Setinek, Lorenzo Zanisi, Naomi Carey, Stanislas Pamela, Johannes Brandstetter

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that Gyro Swin outperforms widely used reduced numerics on heat flux prediction, captures the turbulent energy cascade, and reduces the cost of fully resolved nonlinear gyrokinetics by three orders of magnitude while remaining physically verifiable. In this section we elaborate on our experimental setup, ranging from data generation to baselines and our evaluation setup.
Researcher Affiliation	Academia	1 ELLIS Unit, Institute for Machine Learning, JKU Linz 2 United Kingdom Atomic Energy Authority, Culham campus 3 EMMI AI, Linz
Pseudocode	No	The paper describes the methodology in narrative text and figures (e.g., Figure 2 for architecture), but does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks, nor does it present structured steps in a code-like format.
Open Source Code	Yes	ml-jku/neural-gyrokinetics, ml-jku/gyroswin. Code is added as supplementary material, instructions for data generation are provided in Appendix C
Open Datasets	No	We run nonlinear simulations using the numerical code GKW (Peeters et al., 2009), varying noise amplitude of initial conditions and four operating parameters... The entire dataset comprises 255 simulations, based on which we assemble two training subsets... Data generation pipelines is reproducible, numerical code is open source, trained models and snapshots for evaluation are publicly available on the huggingface hub. The paper describes how the dataset was generated and that snapshots for evaluation are available, but does not explicitly provide access to the full dataset used for training.
Dataset Splits	Yes	The entire dataset comprises 255 simulations, based on which we assemble two training subsets, one comprising 48 simulations and another one comprising 241 simulations... We set aside 14 of the 255 simulations that we generate in total to evaluate for in-distribution (ID) and out-of-distribution (OOD) generalization. In total, we compile select six simulations for the ID set and five simulations for the OOD set. The remaining three simulations are used as a validation set during training.
Hardware Specification	Yes	To assess scalability of neural surrogates, we report inference speed, memory consumption, and number of parameters on a single NVIDIA H100 80GB HBM3 in Table 3. Gyro Swin is trained on four H100 GPUs with 80GB VRAM using Py Torch s Distributed Data Parallel (DDP) for approximately 120 hours. For the larger training set (241 simulations), we train Gyro Swin for 500 epochs on 16 H100 GPUs with 80GB VRAM.
Software Dependencies	No	Gyro Swin is trained on four H100 GPUs with 80GB VRAM using Py Torch s Distributed Data Parallel (DDP) for approximately 120 hours. We use the Adam optimizer (Kingma & Ba, 2015)... While PyTorch is mentioned, its specific version number is not provided, nor are specific versions for other libraries like neuraloperator or SIMSHIFT.
Experiment Setup	Yes	We use the Adam optimizer (Kingma & Ba, 2015) with a weight decay of 1e-5 and a cosine learning rate scheduler with linear warmup and a peak at 3e-4, decayed to 0. During training we employ automatic mixed precision and cast to bfloat16 with gradient clipping to a magnitude of 1. For the smaller training set (48 simulations), we train our model for 200 epochs and evaluate every 20 epochs... For the larger training set (241 simulations), we train Gyro Swin for 500 epochs on 16 H100 GPUs... We employ a scheduler for the different loss terms mentioned in Equation (8)...