ET-Flow: Equivariant Flow-Matching for Molecular Conformer Generation

Authors: Majdi Hassan, Nikhil Shenoy, Jungyoon Lee, Hannes Stärk, Stephan Thaler, Dominique Beaini

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically evaluate ET-Flow by comparing the generated and ground-truth conformers in terms of distance-based RMSD (Section 4.2) and chemical property based metrics (Section 4.4). We present the general experimental setups in Section 4.1.
Researcher Affiliation Collaboration 1Mila & Université de Montréal 2University of British-Columbia 3Massachusetts Institute of Technology 4Valence Labs
Pseudocode Yes Algorithm 1: Training procedure, Algorithm 2: Inference procedure, Algorithm 3: Stochastic Sampler
Open Source Code Yes Code is available https://github.com/shenoynikhil/ETFlow.
Open Datasets Yes We conduct our experiments on the GEOM dataset (Axelrod and Gomez-Bombarelli, 2022), which offers curated conformer ensembles produced through meta-dynamics in CREST (Pracht et al., 2024).
Dataset Splits Yes We use a train/validation/test (243473/30433/1000) split as provided in (Ganea et al., 2021)
Hardware Specification Yes For GEOM-DRUGS, we train ET-Flow for a fixed 250 epochs with a batch size of 64 and 5000 training batches per epoch per GPU on 8 A100 GPUs. For GEOM-QM9, we train ET-Flow for 200 epochs with a batch size of 128, and use all of the training dataset per epoch on 4 A100 GPUs.
Software Dependencies No No specific software dependencies with version numbers were listed in the paper.
Experiment Setup Yes For GEOM-DRUGS, we train ET-Flow for a fixed 250 epochs with a batch size of 64 and 5000 training batches per epoch per GPU on 8 A100 GPUs. For the learning rate, we use the Adam Optimizer with a cosine annealing learning rate which goes from a maximum of 10-3 to a minimum 10-7 over 250 epochs with a weight decay of 10-10. For GEOM-QM9, we train ET-Flow for 200 epochs with a batch size of 128, and use all of the training dataset per epoch per epoch on 4 A100 GPUs. We use the cosine annealing learning rate schedule with maximum of 8x10-4 to minimum of 10-7 over 100 epochs, post which the maximum is reduced by a factor of 0.05. We select checkpoints based on the lowest validation error.