Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Stochastic Process Learning via Operator Flow Matching

Authors: Yaozhong Shi, Zachary Ross, Domniki Asimaki, Kamyar Azizzadenesheli

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we demonstrate the superior regression performance compared to several baselines across a variety of function datasets, including both Gaussian and highly non-Gaussian Processes. As baselines, we employ standard GP Regression [8], Deep GPs [9, 10], Conditional models [5 7], and OPFLOW [1]. For our function datasets, we analyze: (1) Gaussian and non-Gaussian with known posterior, including 1D GPs, 2D GPs, and 1D Truncated GPs, (TGP). (2) Highly non-GPs, datasets with unknown posterior, such as those derived from Navier-Stokes equations [14], black hole dataset from expensive Monte Carlo simulation, and 2D Signed Distance Functions extracted from MNIST digits (MNIST-SDF) [43]. During regression, we assume that the prior Gθ is always successfully trained and remains frozen. Details about the learning process for priors and experimental setup for regression are provided in the Appendix M, O.
Researcher Affiliation	Collaboration	Yaozhong Shi California Institute of Technology EMAIL Zachary E. Ross California Institute of Technology EMAIL Domniki Asimaki California Institute of Technology EMAIL Kamyar Azizzadenesheli NVIDIA Corporation EMAIL
Pseudocode	Yes	Algorithm 1 Posterior sampling with SGLD Algorithm 2 Learning a prior
Open Source Code	Yes	Python code available at https://github.com/yzshi5/SPL_OFM
Open Datasets	Yes	2D Signed Distance Functions extracted from MNIST digits (MNIST-SDF) [43].
Dataset Splits	Yes	We choose l = 0.3 and ζ = 1.5 and generate 20, 000 training samples on domain [0, 1] with a fixed resolution of 256. ... We average out SMSE and MSLL over a test dataset containing 1000 true GP posterior samples for all models.
Hardware Specification	Yes	All time reported in the subsequent tables are based on one computations performed using a single NVIDIA RTX A6000 (48 GB) graphics card.
Software Dependencies	No	The paper mentions "neuraloperator library [14]" and "dopri5 ODE solver provided by torchdiffeq Chen et al. [52]", but does not provide specific version numbers for these software components or any other key libraries.
Experiment Setup	Yes	Table 3 details the parameters used for training the prior. For instance, in the 1D GP prior learning experiment, the dataset consists of 20,000 samples, each with a co-domain dimension (or channel) of one. The batch size is set at 1024, and the model is trained over 500 epochs. The total training time is about 0.76 hours, and the size of the trained model is 37.1 megabytes. Tables 4, 5, and 6 detail the parameters for SGLD sampling as described in Algorithm 1. For example, in the 1D GP regression experiment, the regression takes 40,000 iterations with a burn-in phase of 3,000 iterations. Posterior samples are collected every 10 iterations. The temperature for the injected noise during the gradient update is set at 1, and the learning rate decays exponentially from 0.005 to 0.004 (defined in Algorithm 1).