Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Path Gradients after Flow Matching

Authors: Lorenz Vaitl, Leon Klein

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that this hybrid approach yields up to a threefold increase in sampling efficiency for molecular systems, all while using the same model, a similar computational budget and without the need for additional sampling. Furthermore, by measuring the length of the flow trajectories during fine-tuning, we show that Path Gradients largely preserve the learned structure of the flow.
Researcher Affiliation	Academia	Lorenz Vaitl EMAIL Leon Klein Freie Universität Berlin EMAIL
Pseudocode	Yes	A.12 Pseudocode for forward KL path gradients via augmented adjoint Algorithm 1 Augmented Adjoint Dynamics 1: function FORWARD(t, xt, log q) 2: x, div black_box_dynamics(t, xt) 3: log q gradientx log q x div 4: return x, log q, div 5: end function Algorithm 2 Pathwise Gradient Estimator 1: function PATHWISEGRADIENTESTIMATE(x1, prior, target, flow) 2: log p1 target.energy(x1) 3: log p1 gradientx1(log p1) Integrate using Augmented Adjoint state method 4: x0, log p0,θ, log \| det J\| flow.integrate Aug Adjoint(x1, log p1), inverse=True) 5: log q0 prior.energy(x0) 6: log q0 gradientxo(log q0) Compute gradient of loss w.r.t. sample x0 7: x0L 1 N ( log p0,θ log q0) Backpropagate using standard Adjoint state method 8: path gradients gradientθ (x0 detach( x0L)]) 9: end function
Open Source Code	Yes	We published code for replicating our experiments 2. [footnote with link to github.com/lenz3000/path-grads-after-fm]
Open Datasets	Yes	The authors of Klein et al. (2023b) (CC BY 4.0) made the datasets available here: https://osf.io/srqg7/?view_only= 28deeba0845546fb96d1b2f355db0da5. ... The classical alanine dipeptide dataset ... is available as part of the public bgmol (MIT licence) repository here: https://github.com/noegroup/bgmol. ... The authors of Klein et al. (2023b) made the relaxed alanine dipeptide with the semi empirical force field available here (CC BY 4.0): https://osf.io/srqg7/?view_only= 28deeba0845546fb96d1b2f355db0da5. ... The dipeptide dataset was introduced by Klein et al. (2023a) and is available at https://huggingface.co/datasets/microsoft/timewarp.
Dataset Splits	Yes	We use the same training and test splits as defined in Klein et al. (2023b); Klein & Noé (2024). ... For computing the metrics we use the following number of test samples: LJ13: ESSq: 5 105; ESSp, NLL: 5 105. LJ55: ESSq: 1 105; ESSp, NLL: 1 105. ... For computing the metrics we use the following number of test samples: ESSq: 2 105; ESSp, NLL: 1 105.
Hardware Specification	Yes	All experiments are run on a CPU and complete in roughly one minute. ... For our experiments we used A100 GPUs. ... Finetuning on the 13-particle Lennard Jones system used an A100 for 10 epochs (152 min/run) starting from the Klein et al. (2023b) checkpoints, with training and evaluation on 5 105 samples.
Software Dependencies	No	We primarily use the following code libraries: Py Torch (BSD-3) (Paszke et al., 2019), bgflow (MIT license) (Noé et al., 2019; Köhler et al., 2020), torchdyn (Apache License 2.0) (Poli et al., 2021). Additionally, we use the code from (Garcia Satorras et al., 2021) (MIT license) for EGNNs, as well as the code from (Klein et al., 2023b) (MIT license) and (Klein & Noé, 2024) (MIT license) for models and dataset evaluations.
Experiment Setup	Yes	For Flow Matching, we use the standard formulation proposed in Lipman et al. (2023). The dynamics vθ is modeled using a four-layer fully connected neural network with 64 units per layer and ELU activations. ... We used Adam with default parameters and lr=1e-2 for pure FM/PG and pre-training with FM and lr=5e-3 for finetuning with PG. ... We used Adam with a learning rate of 1e-4 for PG and only the training set provided. ... For LJ13, we used a batch-size of 64 instead of 256... For fine-tuning we used 2 epochs for Path Gradients... For LJ55, we use the same batch-sizes... We fine-tuned with Path Gradients for 1 epoch... To fight these, we employed gradient clipping to a norm of 1 and gradient accumulation to a batch-size of around 1000 in batches of 50.