Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Amortized Sampling with Transferable Normalizing Flows

Authors: Charlie B. Tan, Majdi Hassan, Leon Klein, Saifuddin Syed, Dominique Beaini, Michael Bronstein, Alexander Tong, Kirill Neklyudov

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Through extensive empirical evaluation we demonstrate the efficacy of PROSE as a proposal for a variety of sampling algorithms, finding a simple importance sampling-based fine-tuning procedure to achieve competitive performance to established methods such as sequential Monte Carlo.
Researcher Affiliation	Collaboration	1University of Oxford 2Université de Montréal 3Mila Quebec AI Institute 4Freie Universität Berlin 5Valence Labs 6AITHYRA 7Institut Courtois
Pseudocode	No	The paper describes algorithms and mathematical formulations but does not contain a clearly labeled pseudocode block or algorithm presented in a structured, code-like format.
Open Source Code	Yes	We open source our codebase https://github.com/transferable-samplers/ transferable-samplers, Many Peptides MD dataset https://huggingface.co/ datasets/transferable-samplers/many-peptides-md and model weights https: //huggingface.co/transferable-samplers/model-weights.
Open Datasets	Yes	We introduce Many Peptides MD; a novel dataset of peptide MD trajectories for sequences ranging from 2 to 8 residues in length1. ... 1Available at https://huggingface.co/datasets/transferable-samplers/many-peptides-md
Dataset Splits	Yes	For training, a total of 21,700 uniformly sampled sequences are simulated for 200 ns. For evaluation, 30 sequences of length 2, 4, and 8 are randomly sampled such that all amino acids are represented equally, and simulated for 5 µs. Further details on dataset collection and MD configuration provided in Appendix B. Table 1: Number of sequences used per peptide length for training and evaluation. Sequence length 2 3 4 5 6 7 8 Training 200 1,000 1,500 2,000 3,000 4,000 10,000 Evaluation 30 30 30
Hardware Specification	Yes	All training experiments are run NVIDIA H100 GPUs using distributed data parallelism. ... All evaluation timings are recorded using NVIDIA L40S GPUs.
Software Dependencies	No	The paper mentions software components like 'Adam W optimizer', 'Dormand Prince5 (dopri5) adaptive solver', and 'POT [Flamary et al., 2021] linear optimal transport solver', but does not provide specific version numbers for these or other key software libraries (e.g., Python, PyTorch versions).
Experiment Setup	Yes	All models are trained for 5 · 105 iterations using a batch size of 512 with the Adam W optimizer [Loshchilov and Hutter, 2018]. We employ a cosine learning rate schedule in which the initial and final learning rates are a reduction of the maximal value by factor of 500, as well as exponential moving average with decay of 0.999. ... Continuous Normalizing Flows. We use the ECNF++ training recipe defined by Tan et al. [2025]; this entails a learning rate of 5 · 10⁹4 and weight decay of 1 · 10⁹2, with default Adam W hyperparameters of Adam W β1, β2 of (0.9, 0.999).