Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Flow Density Control: Generative Optimization Beyond Entropy-Regularized Fine-Tuning

Authors: Riccardo De Santi, Marin Vlastelica, Ya-Ping Hsieh, Zebang Shen, Niao He, Andreas Krause

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Finally, we validate our method on illustrative settings, text-to-image, and molecular design tasks, showing that it can steer pre-trained generative models to optimize objectives and solve practically relevant tasks beyond the reach of current fine-tuning schemes.
Researcher Affiliation	Academia	Riccardo De Santi ETH Zurich ETH AI Center EMAIL Marin Vlastelica ETH Zurich ETH AI Center EMAIL Ya-Ping Hsieh ETH Zurich EMAIL Zebang Shen ETH Zurich EMAIL Niao He ETH Zurich ETH AI Center EMAIL Andreas Krause ETH Zurich ETH AI Center EMAIL
Pseudocode	Yes	Algorithm 1 Flow Density Control (FDC) Algorithm 2 ENTROPYREGULARIZEDCONTROLSOLVER (Adjoint Matching [14]) based implementation
Open Source Code	No	Currently we do not provide access to data and code. We are although preparing the release of a public version and are available if needed.
Open Datasets	Yes	Molecular design for single-point energy minimization. We fine-tune Flow Mol [15], pre-trained on QM9 [46], to discover molecules minimizing the single-point total energy computed via extended tight-binding at the GFN1-x TB level of theory [18].
Dataset Splits	No	The paper discusses various experimental settings and how data is sampled or used (e.g., top 0.2% samples, a sample of 100 images) for evaluation, but it does not explicitly provide training/test/validation dataset splits or specific percentages for reproducing the data partitioning for its experiments.
Hardware Specification	Yes	E.1 Used computational resources We run all experiments on a single Nvidia H100 GPU.
Software Dependencies	No	The paper mentions software components like 'Adam optimizer' and 'torch quantile method' but does not provide specific version numbers for these or other key software dependencies.
Experiment Setup	Yes	Risk-averse reward maximization for better worst-case validity or safety. In this experiment, we execute FDC for K = 2 iterations with a total of 1000 gradient steps within each iteration, AM solver (within the FDC scheme) with learning rate of 2e 2, α = 109, and η = 10. Novelty-seeking reward maximization for discovery. We run FDC for K = 2 iterations with a total of 1000 gradient steps within each iteration, AM solver (within the FDC scheme) with learning rate of 3e 6, α = 105, and η = 0.625, and 8000 samples are used to estimate the first variation gradient as explained in Appendix A. Reward maximization regularized via optimal transport distance. Both FDC-A and FDC-B have been run for K = 6 iterations of FDC, with α = 0.1, AM oracle learning rate of 1e 6, η = 6.666. Both their discriminators... MLP architecture with 800 gradient steps, by enforcing the 1-Lip. condition via the standard gradient penalty technique with regularization strength of λGP = 10.0 and learning rate of 1e 4. Conservative manifold exploration. We ran FDC for K = 50 iterations and 2500 gradient steps in total with η = 10 and α = 0.0, 0.01, 0.1, 0.5, 1.0. We set the AM learning rate to 2e 4 and sample trajectories of length 400 for computing the AM loss. Molecular design for single-point energy minimization. In this experiment FDC is run for K = 10 iterations, with merely 2 gradient steps at each iteration... AM learning rate of 1e 4, η = 0.01 and α = 0. Meanwhile, the AM baseline is run for 240 gradient steps with α = 0.0045. Text-to-image bridge designs conservative exploration. For this experiment we ran FDC on a single Nvidia H100 GPU, with K = 2, η = 200, α = 0.001 and a 100 gradient steps in total.