Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

FNOPE: Simulation-based inference on function spaces with Fourier Neural Operators

Authors: Guy Moss, Leah Muhle, Reinhard Drews, Jakob H Macke, Cornelius Schröder

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate the effectiveness of our approach on several benchmark tasks and a challenging spatial inference task from glaciology. FNOPE extends the applicability of SBI methods to new scientiﬁc domains by enabling the inference of function-valued parameters. We apply FNOPE to four simulators: a Gaussian linear toy example, the SIRD model from epidemiology, the Darcy ﬂow inverse problem and a real world application from glaciology (details in Appendix S5). We perform ablation experiments (Appendix S7.1), and observe that the performance of FNOPE is dependent on using sufﬁciently many Fourier modes in the FNO blocks (Fig. S1a,b).
Researcher Affiliation	Academia	1Machine Learning in Science, University of Tübingen, Tübingen, Germany 2Tübingen AI Center, Tübingen, Germany 3Department of Geosciences, University of Tübingen, Tübingen, Germany 4Department Empirical Inference, Max Planck Institute for Intelligent Systems, Tübingen, Germany Joint supervision {firstname.secondname}@uni-tuebingen.de
Pseudocode	No	The paper describes its methodology and architecture (e.g., FNOPE architecture in Fig. 2) but does not include any clearly labeled 'Pseudocode' or 'Algorithm' blocks with structured, code-like steps.
Open Source Code	Yes	Code available at https://github.com/mackelab/fnope
Open Datasets	Yes	Finally, we turn to a real-world task from glaciology: Inference of snow accumulation and basal melt rates of Antarctic ice shelves from radar internal reﬂection horizons (IRHs) [39 41]. ... Data collection was supported by Alfred Wegener Institute through logistic grants AWI_ANT_18. ... We then test the performance of all methods on real data (as in [39]).
Dataset Splits	Yes	We evaluate on a heldout test set {(θo j, lθ j, ηo j, xo j, lx j )}Jtest j=1, where Jtest is the number of test simulations. ... The results for Fig. 3 is based on 100 observations and 1000 posterior samples for each observation. ... All metrics are calculated over a test set of 10 observations (Fig. 5 b-d). ... For SBC Eo D, as well as the predictive MSE on synthetic test simulations, we use 100 test observations and sample 10 posterior samples for each observations and for each method. The real test data consists of one ﬁeld observation (shown in Fig. 6a-b), and the posterior predictive was estimated using 1000 posterior samples.
Hardware Specification	Yes	For the Linear Gaussian and SIRD experiments, we perform our experiments on Nvidia RTX 2080ti GPU nodes. ... The Darcy ﬂow experiment required GPUs with higher VRAM to accommodate the large ( 16k dimensional) parameters and observations. We performed these experiments on Nvidia A100 GPUs. For the Antarctic Ice experiment, we perform training and evaluation on CPU, namely Intel Xeon Gold 16 cores, 2.9GHz.
Software Dependencies	No	For all baseline SBI methods, we use the sbi toolbox [29], for the Simformer baseline we use the publicly available code from Gloeckler et al. [30] . We use an optimized solver to solve the Darcy Flow PDE [38].
Experiment Setup	Yes	For all the baseline methods, we train the networks using an Adam optimizer with a learning rate of 0.0001, and a batch size of 200. For NPE/FMPE (spectral), we use 50 modes, leading to 100 parameters to learn, and a pad width of 20 for the spectral preprocessing (Appendix S4.1). For NPE (spectral) the density estimator is a Neural Spline Flow (NSF) with 2 residual blocks with 50 hidden dimensions each, 5 transforms, with RELU activations. For FNOPE and FNOPE (ﬁx) we use 50 Fourier modes for the FNO blocks. We use 5 FNO blocks with 16 channels, while the context is embedded into 8 channels. We train for a maximum of 500 epochs with an early patience of 50. We used a training batch size of 512 and a learning rate of 0.001.