Conditional Normalizing Flows for Active Learning of Coarse-Grained Molecular Representations
Authors: Henrik Schopmans, Pascal Friederich
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Using alanine dipeptide as an example, we show that our methods obtain a speedup to molecular dynamics simulations of approximately 15.9 to 216.2 compared to the speedup of 4.5 of the current state-of-the-art machine learning approach. |
| Researcher Affiliation | Academia | 1Institute of Nanotechnology, Karlsruhe Institute of Technology, Kaiserstr. 12, 76131 Karlsruhe, Germany 2Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Kaiserstr. 12, 76131 Karlsruhe, Germany. Correspondence to: Pascal Friederich <pascal.friederich@kit.edu>. |
| Pseudocode | No | The paper describes methods in text and uses figures, but does not include explicit pseudocode blocks or algorithms labeled as such. |
| Open Source Code | Yes | Our reference implementation of the described active learning workflow can be found on https://github.com/ aimat-lab/coarse-graining-AL (v1.0). Code to reproduce all experiments is provided. |
| Open Datasets | Yes | As a ground-truth test dataset, we used the dataset provided by Stimper et al. (2022) and Midgley et al. (2023b), which was generated using replica exchange MD simulations with a total of 2.3 1010 potential energy and force evaluations. |
| Dataset Splits | Yes | In the last step of each iteration, we sample points in the CG space that exceed a defined threshold of the ensemble standard deviation using Metropolis Monte Carlo (MC). We uniformly broaden the obtained higherror points by sampling uniformly in a hypersphere around them in CG space. The broadened points are added to the AL dataset, where 80 % are used for training, and 20 % as test samples. |
| Hardware Specification | Yes | All experiments have been performed on a NVIDIA A100 40 GB GPU. Parts of this work were performed on the Hore Ka supercomputer funded by the Ministry of Science, Research and the Arts Baden-W urttemberg and by the Federal Ministry of Education and Research. |
| Software Dependencies | Yes | Energy evaluations and simulations were performed using Open MM 8.0.0 with the reference platform (Eastman et al., 2017). |
| Experiment Setup | Yes | When training the flow by energy, we use a batch size of 8 and a learning rate of 5 10 3. We further clip gradients above a gradient norm of 20. The first AL iteration trains by energy for 12 epochs, all subsequent iterations use 7 epochs. |