reproducibilityindex.ai

Sequence-Augmented SE(3)-Flow Matching For Conditional Protein Generation

Authors: Guillaume Huguet, James Vuckovic, Kilian FATRAS, Eric Thibodeau-Laufer, Pablo Lemos, Riashat Islam, Chenghao Liu, Jarrid Rector-Brooks, Tara Akhound-Sadegh, Michael Bronstein, Alexander Tong, Joey Bose

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically observe that FOLDFLOW-2 outperforms previous state-of-the-art protein structure-based generative models, improving over RFDiffusion in terms of unconditional generation across all metrics including designability, diversity, and novelty across all protein lengths, as well as exhibiting generalization on the task of equilibrium conformation sampling.
Researcher Affiliation	Collaboration	Guillaume Huguet1,2,3 , James Vuckovic1 , Kilian Fatras1 , Eric Thibodeau-Laufer1 , Pablo Lemos1, Riashat Islam1, Cheng-Hao Liu1,3,4, Jarrid Rector-Brooks1,2,3, Tara Akhound-Sadegh1,2,4, Michael Bronstein1,5,6, Alexander Tong1,2,3 , Avishek Joey Bose1,5 1Dreamfold, 2Université de Montréal, 3Mila, 4Mc Gill University, 5University of Oxford, 6Aithyra
Pseudocode	No	The paper describes the model architecture and training procedures verbally and graphically (e.g., Figure 1 for architecture), but it does not include any formal, structured pseudocode or algorithm blocks.
Open Source Code	Yes	Our code can be found at https://github.com/Dream Fold/Fold Flow
Open Datasets	Yes	We use a subset of PDB with resolution < 5Å downloaded from the PDB [Berman et al., 2000] on July 20, 2023. ... We began with a Swiss Prot data dump consistent of 532,003 structures predicted by Alpha Fold2 [Jumper et al., 2021] accessed in February 2024. The Alpha Fold2 predicted structure database is made available under a CC-BY-4.0 license for academic and commercial uses.
Dataset Splits	No	The paper describes the use of a "test set" for evaluation and a mixing ratio for synthetic data during training, but it does not explicitly specify a distinct validation set split (e.g., percentages or counts for training, validation, and test data).
Hardware Specification	Yes	FOLDFLOW-2 is coded in Py Torch and was trained on 2 A100 40GB NVIDIA GPUs for 4 days.
Software Dependencies	No	The paper mentions software components like PyTorch and other models (e.g., ESMFold, Protein MPNN), but it does not provide specific version numbers for these software dependencies, which are necessary for reproducible descriptions.
Experiment Setup	Yes	See table 7 for an overview of the experimental setup. ... Table 7: Overview of Training Setup Training Parameter Value Optimizer ADAM Kingma and Ba [2014] Learning Rate 0.0001 β1, β2, ε 0.9, 0.999, 1e-8 Effective M (max squared residues per batch) 500k % of experimental structures per epoch 33% Minimum number of residues 60 Maximum number of residues 384 Sequence masking probability 50%