Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Curly Flow Matching for Learning Non-gradient Field Dynamics

Authors: Katarina Petrović, Lazar Atanackovic, Viggo Moro, Kacper Kapusniak, Ismail Ilkan Ceylan, Michael Bronstein, Joey Bose, Alexander Tong

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We investigate the application of CURLY-FM on multiple applications which exhibit non-gradient field dynamics including a simple toy example, an ocean currents modeling application, a computational fluid mechanics dataset, and an application to single-cell trajectory inference. We benchmark CURLY-FM against both simulation-free flow matching approaches... We evaluate CURLY-FM using metrics both on held out samples (2-Wasserstein (W2)) as well as metrics which directly measure how well the learned drift fθ field matches the reference drift (Cosine distance and L2 cost).
Researcher Affiliation	Collaboration	1University of Oxford, 2Broad Institute of MIT and Harvard, 3University of Toronto, 4Vector Institute, 5TU Wien, 6AITHYRA, 7Mila Quebec AI Institute, 8Université de Montréal
Pseudocode	Yes	Algorithm 1 Training algorithm for neural path interpolant network Algorithm 2 Marginal Score and Flow Matching
Open Source Code	Yes	Our code repository is accessible at: https://github.com/kpetrovicc/curly-flow-matching.git.
Open Datasets	Yes	We model ocean currents in the Gulf of Mexico using a resolution of 1 km of bathymetry data from HYbrid Coordinate Ocean Model (HYCOM)... human cell fibroblasts [Riba et al., 2022] and erythroblast development in mouse [Pijuan-Sala et al., 2019]... Lagrange Bench [Toshev et al., 2023], specifically the two-dimensional decaying Taylor-Green vortex (2DTGV) dataset
Dataset Splits	Yes	We considered a dataset split of [80%, 20%] across the train and test sets, respectively. For the 2000 particles, this resulted in 1600 being used for training and 400 for testing.
Hardware Specification	Yes	All experiments were conducted using a mixture of CPUs and A10 GPUs.
Software Dependencies	No	The paper provides Python code snippets (Listing 1 and 2) but does not specify version numbers for Python itself or any libraries/frameworks like PyTorch.
Experiment Setup	Yes	All CURLY-FM and baseline experiments are run using lr = 10-4 learning rate and Adam optimizer with default β1, β2, and ϵ values across three seeds and with 1,000 epochs split into 500 epochs to train φt,η followed by 500 epochs to train vt,θ. Baselines. Trajectory Net was run with 250 epochs with the Euler integrator with 20 timesteps per timepoint... We use a batch size of 256 samples. We use a Dormand-Prince 4-5 (dopri5) adaptive step size ODE solver to sample trajectories with absolute and relative tolerances of 10-4.