Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Generation
Authors: Guillaume Huguet, James Vuckovic, Kilian FATRAS, Eric Thibodeau-Laufer, Pablo Lemos, Riashat Islam, Chenghao Liu, Jarrid Rector-Brooks, Tara Akhound-Sadegh, Michael Bronstein, Alexander Tong, Joey Bose
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically observe that FOLDFLOW-2 outperforms previous state-of-the-art protein structure-based generative models, improving over RFDiffusion in terms of unconditional generation across all metrics including designability, diversity, and novelty across all protein lengths, as well as exhibiting generalization on the task of equilibrium conformation sampling. |
| Researcher Affiliation | Collaboration | Guillaume Huguet1,2,3 , James Vuckovic1 , Kilian Fatras1 , Eric Thibodeau-Laufer1 , Pablo Lemos1, Riashat Islam1, Cheng-Hao Liu1,3,4, Jarrid Rector-Brooks1,2,3, Tara Akhound-Sadegh1,2,4, Michael Bronstein1,5,6, Alexander Tong1,2,3 , Avishek Joey Bose1,5 1Dreamfold, 2Université de Montréal, 3Mila, 4Mc Gill University, 5University of Oxford, 6Aithyra |
| Pseudocode | No | The paper describes the model architecture and training procedures verbally and graphically (e.g., Figure 1 for architecture), but it does not include any formal, structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Our code can be found at https://github.com/Dream Fold/Fold Flow |
| Open Datasets | Yes | We use a subset of PDB with resolution < 5Å downloaded from the PDB [Berman et al., 2000] on July 20, 2023. ... We began with a Swiss Prot data dump consistent of 532,003 structures predicted by Alpha Fold2 [Jumper et al., 2021] accessed in February 2024. The Alpha Fold2 predicted structure database is made available under a CC-BY-4.0 license for academic and commercial uses. |
| Dataset Splits | No | The paper describes the use of a "test set" for evaluation and a mixing ratio for synthetic data during training, but it does not explicitly specify a distinct validation set split (e.g., percentages or counts for training, validation, and test data). |
| Hardware Specification | Yes | FOLDFLOW-2 is coded in Py Torch and was trained on 2 A100 40GB NVIDIA GPUs for 4 days. |
| Software Dependencies | No | The paper mentions software components like PyTorch and other models (e.g., ESMFold, Protein MPNN), but it does not provide specific version numbers for these software dependencies, which are necessary for reproducible descriptions. |
| Experiment Setup | Yes | See table 7 for an overview of the experimental setup. ... Table 7: Overview of Training Setup Training Parameter Value Optimizer ADAM Kingma and Ba [2014] Learning Rate 0.0001 β1, β2, ε 0.9, 0.999, 1e-8 Effective M (max squared residues per batch) 500k % of experimental structures per epoch 33% Minimum number of residues 60 Maximum number of residues 384 Sequence masking probability 50% |