Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Efficient Molecular Conformer Generation with SO(3)-Averaged Flow Matching and Reflow

Authors: Zhonglin Cao, Mario Geiger, Allan Dos Santos Costa, Danny Reidenbach, Karsten Kreis, Tomas Geffner, Franco Pellegrini, Guoqing Zhou, Emine Kucukbenli

ICML 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4. Experiments Following previous works, we train and evaluate our model on the GEOM-QM9 and GEOM-Drugs datasets (Axelrod & Gomez-Bombarelli, 2022). We follow the splitting strategy proposed by Ganea et al. (2021); Jing et al. (2022), and test our model on the same test set containing 1000 molecules for both QM9 and Drugs datasets. Dataset and splitting details are included in Sec. B.1. The major model evaluation metrics are the average minimum RMSD (AMR, the lower the better) and coverage (COV, the higher the better). Both AMR and coverage are reported for precision (AMR-P and COV-P) and recall (AMR-R and COV-R).
Researcher Affiliation	Collaboration	1NVIDIA 2MIT Center for Bits and Atoms 3Work was completed during internship with NVIDIA.
Pseudocode	Yes	Algorithm 1 Averaged Flow with Reflow+Distillation Train
Open Source Code	Yes	We provide the Python implementation of this formula in Appendix C.1.
Open Datasets	Yes	We train and evaluate our model on the GEOM-QM9 and GEOM-Drugs datasets (Axelrod & Gomez-Bombarelli, 2022).
Dataset Splits	Yes	The train/val/test set of GEOM-Drugs contains 243473/30433/1000 molecules, respectively. The train/val/test set of GEOM-QM9 contains 106586/13323/1000 molecules, respectively.
Hardware Specification	Yes	The Nequ IP model is trained with the Averaged Flow for 990 epochs on the GEOM-Drugs dataset and 1500 epochs on the GEOM-QM9 dataset using 2 NVIDIA A5880 GPUs. ... The reflow and distillation are done using 4 NVIDIA A100 GPUs... We trained 2 variants of the Di T model, Di T (52M) and Di T-L (64M), using Averaged Flow. ... # GPUs 8 8 2 GPU name NVIDIA A100 NVIDIA A100 NVIDIA A5880
Software Dependencies	No	We used the Tsitouras 5/4 solver (Tsitouras, 2011) implemented in the diffrax package with adaptive stepping. ... The model is implemented using e3nn-jax (Geiger & Smidt, 2022; Geiger et al., 2022).
Experiment Setup	Yes	We used Adam optimizer with learning rate of 1e 2, which decays to 5e 3 after 600 epochs and to 1e 3 after 850 epochs. We selected the top-30 conformers for model training. ... The effective average batch size is 208 and 416 for Drugs and QM9 dataset, respectively. ... During the reflow stage, the model is finetuned for 870 epochs on Drugs and 1530 epochs on QM9. We used Adam optimizer with learning rate of 5e 3, which decays to 2.5e 3 after 450 epochs for Drugs (500 epochs for QM9), and to 5e 4 after 650 epochs for Drugs (900 epochs for QM9). ... We used exponential moving average (EMA) with a decay of 0.999 for all Averaged Flow, reflow, and distillation training. ... Table 5. Hyperparameters for Di T models training.