Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].
Fisher Flow Matching for Generative Modeling over Discrete Data
Authors: Oscar Davis, Samuel Kessler, Mircea Petrache, Ismail Ceylan, Michael Bronstein, Joey Bose
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate FISHER-FLOW on an array of synthetic and diverse real-world benchmarks, including designing DNA Promoter, and DNA Enhancer sequences. Empirically, we find that FISHER-FLOW improves over prior diffusion and flow-matching models on these benchmarks. |
| Researcher Affiliation | Collaboration | 1University of Oxford, 2Pontificia Universidad Catรณlica de Chile, 3Aithyra |
| Pseudocode | Yes | We detail our method for training FISHER-FLOW in Algorithm 1 in F.2. |
| Open Source Code | Yes | Our code is available at https://github.com/olsdavis/fisher-flow. |
| Open Datasets | Yes | We train our model over the QM9 dataset [61, 60]. |
| Dataset Splits | Yes | We use the same train/val/test splits as Stark et al. [68] of size 88,470/3,933/7,497. |
| Hardware Specification | Yes | All experiments are run on a single Nvidia A10 or RTX A6000 GPUs. |
| Software Dependencies | No | All of our code is implemented in Python, using Py Torch. For the implementation of the manifold functions (such as log, exp, geodesic distance, etc.), we have tried two different versions. The first one was a direct port of Manifolds.JL [10], originally written in Julia; the second one used the geoopt library [46] as a back-end. The latter performed noticeably better the underlying reason being probably a better numerical stability of the provided functions. As for the optimal transport part, it is essentially an adaptation of that of Fold Flow [18], which itself relies on the POT library [31]. |
| Experiment Setup | Yes | We train our generative models for 200,000 steps with a batch size of 256. We cache the best checkpoint over the course of training according to the validation MSE between the true promoter signal and the signal from the Sei model conditioned on the generated promoter DNA sequences. |