Conditional Normalizing Flows for Active Learning of Coarse-Grained Molecular Representations

Authors: Henrik Schopmans, Pascal Friederich

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Using alanine dipeptide as an example, we show that our methods obtain a speedup to molecular dynamics simulations of approximately 15.9 to 216.2 compared to the speedup of 4.5 of the current state-of-the-art machine learning approach.
Researcher Affiliation Academia 1Institute of Nanotechnology, Karlsruhe Institute of Technology, Kaiserstr. 12, 76131 Karlsruhe, Germany 2Institute of Theoretical Informatics, Karlsruhe Institute of Technology, Kaiserstr. 12, 76131 Karlsruhe, Germany. Correspondence to: Pascal Friederich <pascal.friederich@kit.edu>.
Pseudocode No The paper describes methods in text and uses figures, but does not include explicit pseudocode blocks or algorithms labeled as such.
Open Source Code Yes Our reference implementation of the described active learning workflow can be found on https://github.com/ aimat-lab/coarse-graining-AL (v1.0). Code to reproduce all experiments is provided.
Open Datasets Yes As a ground-truth test dataset, we used the dataset provided by Stimper et al. (2022) and Midgley et al. (2023b), which was generated using replica exchange MD simulations with a total of 2.3 1010 potential energy and force evaluations.
Dataset Splits Yes In the last step of each iteration, we sample points in the CG space that exceed a defined threshold of the ensemble standard deviation using Metropolis Monte Carlo (MC). We uniformly broaden the obtained higherror points by sampling uniformly in a hypersphere around them in CG space. The broadened points are added to the AL dataset, where 80 % are used for training, and 20 % as test samples.
Hardware Specification Yes All experiments have been performed on a NVIDIA A100 40 GB GPU. Parts of this work were performed on the Hore Ka supercomputer funded by the Ministry of Science, Research and the Arts Baden-W urttemberg and by the Federal Ministry of Education and Research.
Software Dependencies Yes Energy evaluations and simulations were performed using Open MM 8.0.0 with the reference platform (Eastman et al., 2017).
Experiment Setup Yes When training the flow by energy, we use a batch size of 8 and a learning rate of 5 10 3. We further clip gradients above a gradient norm of 20. The first AL iteration trains by energy for 12 epochs, all subsequent iterations use 7 epochs.