Fisher Flow Matching for Generative Modeling over Discrete Data

Authors: Oscar Davis, Samuel Kessler, Mircea Petrache, Ismail Ceylan, Michael Bronstein, Joey Bose

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate FISHER-FLOW on an array of synthetic and diverse real-world benchmarks, including designing DNA Promoter, and DNA Enhancer sequences. Empirically, we find that FISHER-FLOW improves over prior diffusion and flow-matching models on these benchmarks.
Researcher Affiliation Collaboration 1University of Oxford, 2Pontificia Universidad Católica de Chile, 3Aithyra
Pseudocode Yes We detail our method for training FISHER-FLOW in Algorithm 1 in F.2.
Open Source Code Yes Our code is available at https://github.com/olsdavis/fisher-flow.
Open Datasets Yes We train our model over the QM9 dataset [61, 60].
Dataset Splits Yes We use the same train/val/test splits as Stark et al. [68] of size 88,470/3,933/7,497.
Hardware Specification Yes All experiments are run on a single Nvidia A10 or RTX A6000 GPUs.
Software Dependencies No All of our code is implemented in Python, using Py Torch. For the implementation of the manifold functions (such as log, exp, geodesic distance, etc.), we have tried two different versions. The first one was a direct port of Manifolds.JL [10], originally written in Julia; the second one used the geoopt library [46] as a back-end. The latter performed noticeably better the underlying reason being probably a better numerical stability of the provided functions. As for the optimal transport part, it is essentially an adaptation of that of Fold Flow [18], which itself relies on the POT library [31].
Experiment Setup Yes We train our generative models for 200,000 steps with a batch size of 256. We cache the best checkpoint over the course of training according to the validation MSE between the true promoter signal and the signal from the Sei model conditioned on the generated promoter DNA sequences.