SE(3)-Stochastic Flow Matching for Protein Backbone Generation
Authors: Joey Bose, Tara Akhound-Sadegh, Guillaume Huguet, Kilian FATRAS, Jarrid Rector-Brooks, Cheng-Hao Liu, Andrei Cristian Nica, Maksym Korablyov, Michael M. Bronstein, Alexander Tong
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we validate FOLDFLOW on protein backbone generation of up to 300 amino acids leading to high-quality designable, diverse, and novel samples. |
| Researcher Affiliation | Collaboration | 1Mc Gill University, 2Mila, 3Dreamfold, 4Université de Montréal, 5University of Oxford |
| Pseudocode | Yes | Algorithm 1 FOLDFLOW-SFM training on SE(3) N, Algorithm 2 FOLDFLOW-SFM training on SO(3), Algorithm 3 Fold Flow-SFM Inference |
| Open Source Code | Yes | Our code can be found at https://github.com/DreamFold/FoldFlow. |
| Open Datasets | Yes | We evaluate FOLDFLOW models in generating valid, diverse, and novel backbones by training on a subset of the Protein Data Bank (PDB) with 22,248 proteins. We use a subset of PDB filtered with the same criteria as Frame Diff, specifically, we filter for monomers of length between 60 and 512 (inclusive) with resolution < 5Å downloaded from PDB (Berman et al., 2000) on July 20, 2023. |
| Dataset Splits | No | The paper mentions 'training dataset' and 'test samples' but does not explicitly describe a validation dataset split or percentages used for validation. |
| Hardware Specification | Yes | We train our model in Pytorch using distributed data-parallel (DDP) across four NVIDIA A100-80GB GPUs for roughly 2.5 days. |
| Software Dependencies | No | The paper mentions 'Pytorch' and 'Open Fold' but does not specify version numbers for these software dependencies. |
| Experiment Setup | Yes | We use the Adam optimizer with constant learning rate 10^-4, β1 = 0.9, β2 = 0.99. The batch size depends on the length of the protein to maintain roughly constant memory usage. In practice, we set the effective batch size to eff_bs = max(round(#GPUs * 500, 000/N^2), 1) (46) for each step. We set λaux = 0.25 and weight the rotation loss with coefficient 0.5 as compared to the translation loss which has weight 1.0. |