reproducibilityindex.ai

AlphaFold Meets Flow Matching for Generating Protein Ensembles

Authors: Bowen Jing, Bonnie Berger, Tommi Jaakkola

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	When trained and evaluated on the PDB, our method provides a superior combination of precision and diversity compared to Alpha Fold with MSA subsampling. When further trained on ensembles from all-atom MD, our method accurately captures conformational flexibility, positional distributions, and higherorder ensemble observables for unseen proteins.
Researcher Affiliation	Academia	1CSAIL, Massachusetts Institute of Technology 2Department of Mathematics, Massachusetts Institute of Technology.
Pseudocode	Yes	Algorithm 1 TRAINING and Algorithm 2 INFERENCE are provided on page 3.
Open Source Code	Yes	Code is available at https://github.com/ bjing2016/alphaflow.
Open Datasets	Yes	We fine-tune all weights of Alpha Fold and ESMFold on the PDB with our flow matching framework, starting from their publicly available pretrained weights... Next, to demonstrate and assess the ability of our method to learn from MD ensembles, we continue fine-tuning both models on the ATLAS dataset of all-atom MD simulations (Vander Meersche et al., 2023).
Dataset Splits	Yes	Using training and validation cutoffs of May 1, 2018 and May 1, 2019, we obtain train/val/test splits of 1265/39/82 ensembles (2 excluded due to length).
Hardware Specification	Yes	All training is done on a machine with 8x NVIDIA A100 GPUs and 2x Intel Xeon(R) Gold 6258R processors
Software Dependencies	No	The paper mentions several software tools used, such as Open Fold (Ahdritz et al., 2022), MMSeqs (Steinegger & Soding, 2017), Colab Fold (Porter et al., 2023), and MDTraj (Mc Gibbon et al., 2015), but it does not specify explicit version numbers for these ancillary software components.
Experiment Setup	Yes	We train with crops of size 256, batch size of 64, no recycling, and no templates. Alpha FLOW is trained on the full set of auxiliary losses, except the structural violation loss and with the FAPE loss squared. ESMFLOW is trained on the FAPE, p LDDT, distogram, and supervised χ losses.