Generative Modeling of Molecular Dynamics Trajectories
Authors: Bowen Jing, Hannes Stärk, Tommi Jaakkola, Bonnie Berger
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the full set of these capabilities on tetrapeptide simulations and show preliminary results on scaling to protein monomers. Altogether, our work illustrates how generative modeling can unlock value from MD data towards diverse downstream tasks that are not straightforward to address with existing methods or even MD itself. We evaluate MDGEN on the forward simulation, interpolation, upsampling, and inpainting tasks on tetrapeptides in a transferable setting (i.e., unseen test peptides). |
| Researcher Affiliation | Academia | Bowen Jing 1 Hannes Stärk 1 Tommi Jaakkola1 Bonnie Berger1 2 1CSAIL, Massachusetts Institute of Technology 2Dept. of Mathematics, Massachusetts Institute of Technology {bjing, hstark}@mit.edu, tommi@csail.mit.edu, bab@mit.edu |
| Pseudocode | Yes | Algorithm 1: Velocity network, Algorithm 2: Diffusion Transformer Attention Layer, Algorithm 3: Invariant Point Attention Layer are provided in Appendix A.1. |
| Open Source Code | Yes | Code is available at https://github.com/bjing2016/mdgen. |
| Open Datasets | Yes | For proteins, we use explicit-solvent, all-atom simulations from the ATLAS dataset (Vander Meersche et al., 2024). To obtain tetrapeptide MD trajectories for training and evaluation, we run implicit- and explicit-solvent, all-atom simulations of 3000 training, 100 validation, and 100 test tetrapeptides for 100 ns. |
| Dataset Splits | Yes | For explicit-solvent settings (forward simulation, interpolation, inpainting), we run simulations for 3109 training, 100 validation, and 100 test peptides. For implicit-solvent settings (upsampling), we run simulations for 2646 training, 100 validation, and 100 test peptides. |
| Hardware Specification | Yes | MD runtimes in Table 2 are tabulated on a NVIDIA T4 GPU. All MDGEN experiments are carried out on NVIDIA A6000 GPUs. Alpha Flow and MSA subsampling runtimes in Table 4 are tabulated on NVIDIA A100 GPUs by Jing et al. (2024). |
| Software Dependencies | No | The paper mentions software like 'Open MM (Eastman et al., 2017)' and 'Py EMMA (Scherer et al., 2015; Wehmeyer et al.)' and 'amber14 force field parameters', but it does not specify explicit version numbers for these software packages or libraries used in their experimental setup within the main text or appendices. |
| Experiment Setup | Yes | Unless otherwise specified, models are trained with trajectory timesteps of t = 10 ps. All simulations are integrated with Langevin thermostat at 350K with hydrogen bond constraints, timestep 2 fs, and friction coefficient 0.3 ps 1 (explicit) or 0.1 ps 1 (implicit). |