Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

In-Context Learning of Stochastic Differential Equations with Foundation Inference Models

Authors: Patrick Seifner, Kostadin Cvejoski, David Berghaus, César Ali Ojeda Marin, Ramsés J. Sánchez

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that FIM-SDE achieves robust in-context function estimation across a wide range of synthetic and real-world processes from canonical SDE systems (e.g. double-well dynamics or weakly perturbed Lorenz attractors) to stock price recordings and oil-price and wind-speed fluctuations while matching the performance of symbolic, Gaussian process and Neural SDE baselines trained on the target datasets. When finetuned to the target processes, we show that FIM-SDE consistently outperforms all these baselines. Our pretrained model, repository and tutorials are available online1. 5 Experiments What follows outlines the experimental setup, including pretraining details, datasets, evaluation metrics, and baselines.
Researcher Affiliation	Collaboration	Lamarr Institute1, University of Bonn2, Fraunhofer IAIS3 & University of Potsdam4 EMAIL, EMAIL
Pseudocode	Yes	More precisely, our algorithm for sampling a n-variate polynomial f is : 1. Sample the size of the set of degrees Ndeg U[1, . . . , mmax] with terms of non-zero coefficients in the polynomial. 2. Sample7 {m1, . . . , m Ndeg} U[PNdeg[{0, . . . , mmax}]], the set of monomial degrees with terms of non-zero coefficients in the polynomial. 3. For each i {1, . . . , Ndeg}, sample N i mon U[1, . . . , mi+n 1 n 1 ], the number of monomials of degree mi with non-zero coefficients in the polynomial. 4. For each i {1, . . . , Ndeg}, sample {αi 1, . . . , αi Nimon} U[PN imon[{α Nn \| \|α\| = mi}]], the n-variate exponents of monomials with non-zero coefficients in the polynomial. 5. For each i {1, . . . , Ndeg} and j {1, . . . , N i mon} sample a coefficient cαi j N(0, 1) of the n-variate monomial xαi j. 6. The n-variate polynomial f is then defined as f(x) = PNdeg i=1 PN i mon j=1 cαi jxαi j.
Open Source Code	Yes	Our pretrained model, repository and tutorials are available online1. 1https://fim4science.github.io/OpenFIM/intro.html
Open Datasets	Yes	The set of four emirical datasets, representing complex real-world phenomena, were collected and studied by Wang et al. (2022). They were released alongside the open-source implementation of BISDE11.
Dataset Splits	Yes	We partition each dataset into five folds, train BISDE and Latent SDE on four of these, and compute the MMD on the held-out fold, repeating this process across all folds and averaging the results (see Appendix G.3 for details).
Hardware Specification	Yes	Memory requirements during training are quite large, as we use up to 12800 observations (inputs) per instance in a batch. We utilize four A100 40GB GPUs to train with a batch size of 64 for 1.3M optimization steps over roughly 6 days.
Software Dependencies	No	The paper mentions using specific implementations like "BISDE" and "torchsde" and provides links to their GitHub repositories, but does not specify exact version numbers for these or other underlying software libraries (e.g., Python, PyTorch, CUDA).
Experiment Setup	Yes	We train FIM-SDE with Adam W (Loshchilov and Hutter, 2017), using learning rate 1e 5 and weight decay 1e 4. In each batch, we sample the total number of observations passed to the model from U[128, 12800], so the model is exposed to widely varying context sizes. For each equation in a batch, we randomly sample 32 locations to compute the loss L from (10).