Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
ET-Flow: Equivariant Flow-Matching for Molecular Conformer Generation
Authors: Majdi Hassan, Nikhil Shenoy, Jungyoon Lee, Hannes Stärk, Stephan Thaler, Dominique Beaini
NeurIPS 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically evaluate ET-Flow by comparing the generated and ground-truth conformers in terms of distance-based RMSD (Section 4.2) and chemical property based metrics (Section 4.4). We present the general experimental setups in Section 4.1. |
| Researcher Affiliation | Collaboration | 1Mila & Université de Montréal 2University of British-Columbia 3Massachusetts Institute of Technology 4Valence Labs |
| Pseudocode | Yes | Algorithm 1: Training procedure, Algorithm 2: Inference procedure, Algorithm 3: Stochastic Sampler |
| Open Source Code | Yes | Code is available https://github.com/shenoynikhil/ETFlow. |
| Open Datasets | Yes | We conduct our experiments on the GEOM dataset (Axelrod and Gomez-Bombarelli, 2022), which offers curated conformer ensembles produced through meta-dynamics in CREST (Pracht et al., 2024). |
| Dataset Splits | Yes | We use a train/validation/test (243473/30433/1000) split as provided in (Ganea et al., 2021) |
| Hardware Specification | Yes | For GEOM-DRUGS, we train ET-Flow for a fixed 250 epochs with a batch size of 64 and 5000 training batches per epoch per GPU on 8 A100 GPUs. For GEOM-QM9, we train ET-Flow for 200 epochs with a batch size of 128, and use all of the training dataset per epoch on 4 A100 GPUs. |
| Software Dependencies | No | No specific software dependencies with version numbers were listed in the paper. |
| Experiment Setup | Yes | For GEOM-DRUGS, we train ET-Flow for a fixed 250 epochs with a batch size of 64 and 5000 training batches per epoch per GPU on 8 A100 GPUs. For the learning rate, we use the Adam Optimizer with a cosine annealing learning rate which goes from a maximum of 10-3 to a minimum 10-7 over 250 epochs with a weight decay of 10-10. For GEOM-QM9, we train ET-Flow for 200 epochs with a batch size of 128, and use all of the training dataset per epoch per epoch on 4 A100 GPUs. We use the cosine annealing learning rate schedule with maximum of 8x10-4 to minimum of 10-7 over 100 epochs, post which the maximum is reduced by a factor of 0.05. We select checkpoints based on the lowest validation error. |