Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Neural Stochastic Flows: Solver-Free Modelling and Inference for SDE Solutions

Authors: Naoki Kiyohara, Edward Johns, Yingzhen Li

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Empirically, across diverse benchmarks such as stochastic Lorenz attractor, CMU Motion Capture, and Stochastic Moving MNIST, our approach maintains distributional accuracy comparable to or better than numerical solver methods, while delivering up to two orders of magnitude faster predictions over arbitrary time intervals, with the largest gains on long-interval forecasts.
Researcher Affiliation	Collaboration	Naoki Kiyohara1,2 Edward Johns1 Yingzhen Li1 1Imperial College London 2Canon Inc. EMAIL
Pseudocode	Yes	Algorithm 1: Optimisation procedure for (Latent) Neural Stochastic Flows
Open Source Code	No	For now, we include the related references for the sources of the datasets. We will release the code and detailed running instructions upon publication.
Open Datasets	Yes	Empirically, across diverse benchmarks such as stochastic Lorenz attractor, CMU Motion Capture, and Stochastic Moving MNIST, our approach maintains distributional accuracy comparable to or better than numerical solver methods, while delivering up to two orders of magnitude faster predictions over arbitrary time intervals, with the largest gains on long-interval forecasts. We evaluate Latent NSF on the CMU Motion Capture walking dataset [15]. Finally, we test Latent NSF on high-dimensional video using a Stochastic Moving MNIST [11] variant
Dataset Splits	Yes	The 23-sequence walking subset is down-sampled to 300 time steps and split 16/3/4 for train/validation/test, matching prior work [51, 54]. Dataset size: Training: 60,000 sequences (using original MNIST training set) Test: 10,000 sequences (using MNIST test set)
Hardware Specification	Yes	All experiments were conducted on a single NVIDIA RTX 3090 GPU (24 GB); typical training times are under few days per dataset as reported in Appendix E.
Software Dependencies	No	Frameworks: Our (Latent) NSF models: JAX [5] with Equinox library [24] Py Torch baselines: torchsde [33] JAX baselines: Diffrax [23]
Experiment Setup	Yes	E.2.2 Configurations for Neural Stochastic Flows: Architecture: State dimension d_state = 3; conditioning dimension d_cond = 4 (state + time) Gaussian parameter network: Input(d_cond) -> 2 * [Linear(64) -> SiLU()] -> Linear(2d_state); splits into (mean, std) with std via Softplus() Scale-shift networks (in conditional flow): Input(d_state/2 + d_cond) -> 2 [Linear(64) -> SiLU()] -> Linear(d_state); splits into (scale, shift in Eq. (7) in the main text) Conditional flow (affine coupling; Eq. (5), (6)): 4 layers with alternating masking Parameters: NSF: 25,130 Bridge model: 26,410 Data conversion: Time series data is converted into pairs of states (xti, xtj) where tj > ti, to predict p(xtj \| xti) Epochs: 1000 Batch size: 256 Optimiser: AdamW [27, 37] Learning rate: 0.001 Weight decay: 10^-5 Loss function: Combined negative log-likelihood and flow loss (Eq. (11)) * λ = 0.4 (0.2 for data component, 0.2 for sampled component) * λ1-to-2 = λ2-to-1 = 1.0 Time triplets (ti, tj, tk) for Lflow: Sampled as described in Appendix B with Htrain = 1 Auxiliary updates: bridge model bξ trained concurrently with K = 5 inner optimisation steps per main model update