AlphaFold Meets Flow Matching for Generating Protein Ensembles

Authors: Bowen Jing, Bonnie Berger, Tommi Jaakkola

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental When trained and evaluated on the PDB, our method provides a superior combination of precision and diversity compared to Alpha Fold with MSA subsampling. When further trained on ensembles from all-atom MD, our method accurately captures conformational flexibility, positional distributions, and higherorder ensemble observables for unseen proteins.
Researcher Affiliation Academia 1CSAIL, Massachusetts Institute of Technology 2Department of Mathematics, Massachusetts Institute of Technology.
Pseudocode Yes Algorithm 1 TRAINING and Algorithm 2 INFERENCE are provided on page 3.
Open Source Code Yes Code is available at https://github.com/ bjing2016/alphaflow.
Open Datasets Yes We fine-tune all weights of Alpha Fold and ESMFold on the PDB with our flow matching framework, starting from their publicly available pretrained weights... Next, to demonstrate and assess the ability of our method to learn from MD ensembles, we continue fine-tuning both models on the ATLAS dataset of all-atom MD simulations (Vander Meersche et al., 2023).
Dataset Splits Yes Using training and validation cutoffs of May 1, 2018 and May 1, 2019, we obtain train/val/test splits of 1265/39/82 ensembles (2 excluded due to length).
Hardware Specification Yes All training is done on a machine with 8x NVIDIA A100 GPUs and 2x Intel Xeon(R) Gold 6258R processors
Software Dependencies No The paper mentions several software tools used, such as Open Fold (Ahdritz et al., 2022), MMSeqs (Steinegger & Soding, 2017), Colab Fold (Porter et al., 2023), and MDTraj (Mc Gibbon et al., 2015), but it does not specify explicit version numbers for these ancillary software components.
Experiment Setup Yes We train with crops of size 256, batch size of 64, no recycling, and no templates. Alpha FLOW is trained on the full set of auxiliary losses, except the structural violation loss and with the FAPE loss squared. ESMFLOW is trained on the FAPE, p LDDT, distogram, and supervised χ losses.