reproducibilityindex.ai

Fisher Flow Matching for Generative Modeling over Discrete Data

Authors: Oscar Davis, Samuel Kessler, Mircea Petrache, Ismail Ceylan, Michael Bronstein, Joey Bose

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate FISHER-FLOW on an array of synthetic and diverse real-world benchmarks, including designing DNA Promoter, and DNA Enhancer sequences. Empirically, we find that FISHER-FLOW improves over prior diffusion and flow-matching models on these benchmarks.
Researcher Affiliation	Collaboration	1University of Oxford, 2Pontificia Universidad Católica de Chile, 3Aithyra
Pseudocode	Yes	We detail our method for training FISHER-FLOW in Algorithm 1 in F.2.
Open Source Code	Yes	Our code is available at https://github.com/olsdavis/fisher-flow.
Open Datasets	Yes	We train our model over the QM9 dataset [61, 60].
Dataset Splits	Yes	We use the same train/val/test splits as Stark et al. [68] of size 88,470/3,933/7,497.
Hardware Specification	Yes	All experiments are run on a single Nvidia A10 or RTX A6000 GPUs.
Software Dependencies	No	All of our code is implemented in Python, using Py Torch. For the implementation of the manifold functions (such as log, exp, geodesic distance, etc.), we have tried two different versions. The first one was a direct port of Manifolds.JL [10], originally written in Julia; the second one used the geoopt library [46] as a back-end. The latter performed noticeably better the underlying reason being probably a better numerical stability of the provided functions. As for the optimal transport part, it is essentially an adaptation of that of Fold Flow [18], which itself relies on the POT library [31].
Experiment Setup	Yes	We train our generative models for 200,000 steps with a batch size of 256. We cache the best checkpoint over the course of training according to the validation MSE between the true promoter signal and the signal from the Sei model conditioned on the generated promoter DNA sequences.