AlphaFold Meets Flow Matching for Generating Protein Ensembles
Authors: Bowen Jing, Bonnie Berger, Tommi Jaakkola
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | When trained and evaluated on the PDB, our method provides a superior combination of precision and diversity compared to Alpha Fold with MSA subsampling. When further trained on ensembles from all-atom MD, our method accurately captures conformational flexibility, positional distributions, and higherorder ensemble observables for unseen proteins. |
| Researcher Affiliation | Academia | 1CSAIL, Massachusetts Institute of Technology 2Department of Mathematics, Massachusetts Institute of Technology. |
| Pseudocode | Yes | Algorithm 1 TRAINING and Algorithm 2 INFERENCE are provided on page 3. |
| Open Source Code | Yes | Code is available at https://github.com/ bjing2016/alphaflow. |
| Open Datasets | Yes | We fine-tune all weights of Alpha Fold and ESMFold on the PDB with our flow matching framework, starting from their publicly available pretrained weights... Next, to demonstrate and assess the ability of our method to learn from MD ensembles, we continue fine-tuning both models on the ATLAS dataset of all-atom MD simulations (Vander Meersche et al., 2023). |
| Dataset Splits | Yes | Using training and validation cutoffs of May 1, 2018 and May 1, 2019, we obtain train/val/test splits of 1265/39/82 ensembles (2 excluded due to length). |
| Hardware Specification | Yes | All training is done on a machine with 8x NVIDIA A100 GPUs and 2x Intel Xeon(R) Gold 6258R processors |
| Software Dependencies | No | The paper mentions several software tools used, such as Open Fold (Ahdritz et al., 2022), MMSeqs (Steinegger & Soding, 2017), Colab Fold (Porter et al., 2023), and MDTraj (Mc Gibbon et al., 2015), but it does not specify explicit version numbers for these ancillary software components. |
| Experiment Setup | Yes | We train with crops of size 256, batch size of 64, no recycling, and no templates. Alpha FLOW is trained on the full set of auxiliary losses, except the structural violation loss and with the FAPE loss squared. ESMFLOW is trained on the FAPE, p LDDT, distogram, and supervised χ losses. |