Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Stiefel Flow Matching for Moment-Constrained Structure Elucidation

Authors: Austin H Cheng, Alston Lo, Kin Long Kelvin Lee, Santiago Miret, Alan Aspuru-Guzik

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 EXPERIMENTS We evaluate Euclidean diffusion models and Stiefel Flow Matching on the QM9 and GEOM datasets. For each example, the model takes in moments and molecular formula and produces K = 10 samples. ... Table 1: Experimental results on QM9. Stiefel FM shows no violation of moment constraints as shown in the Error metrics, and has the highest success rate for structure elucidation, with the lowest computational cost.
Researcher Affiliation	Collaboration	Austin H. Cheng1,2 Alston Lo1,2 Kin Long Kelvin Lee3 Santiago Miret3 Alan Aspuru-Guzik1,2,4 1University of Toronto 2Vector Institute 3Intel Labs 4Acceleration Consortium
Pseudocode	Yes	Algorithm 1 Computing a Stiefel geodesic γ(t). (Edelman et al., 1998) Algorithm 2 Computing the Stiefel logarithm. (Zimmermann & H uper, 2022) Algorithm 3 Sampling under Stiefel Flow Matching. Algorithm 4 Heuristic alignment algorithm.
Open Source Code	Yes	https://github.com/aspuru-guzik-group/stiefel FM
Open Datasets	Yes	Datasets. For QM9 (Ramakrishnan et al., 2014), we use the conformers provided by the GEOM dataset. We abbreviate GEOM-Drugs (Axelrod & Gomez-Bombarelli, 2022) as GEOM.
Dataset Splits	Yes	QM9 has train/val/test splits of 104265/13056/13033 molecules, while GEOM has splits of 233625/29203/29203 molecules, or 5537598/29203/29203 conformers.
Hardware Specification	Yes	Models were trained on 4 NVIDIA A100 40GB GPUs.
Software Dependencies	No	The paper lists numerous software libraries in the acknowledgments (e.g., PyTorch, PyTorch Lightning, RDKit, NumPy, SciPy, pandas), but it only cites the papers or development teams associated with them, without providing specific version numbers for the software used in the experiments. For example, it lists "Py Torch (Paszke et al., 2019)" which refers to the paper describing PyTorch, not a specific version used.
Experiment Setup	Yes	Table 3: General training and sampling hyperparameters Hyperparameter QM9 GEOM Epochs 1000 60 Batch size per GPU 256 24 Optimizer Adam W Adam W Learning rate 10 4 10 4 Learning rate warmup steps 2000 2000 Weight decay 0.01 0.01 Gradient clipping yes yes EMA decay 0.9995 0.9995 KREED Timesteps 1000 1000 Schedule polynomial polynomial Stiefel FM Timesteps 200 200 Table 4: Training and sampling hyperparameters for Stiefel Flow Matching. Dataset Model Timestep sampling OT stochasticity γ Stiefel FM uniform no 0.00 Stiefel FM-OT uniform yes 0.00 Stiefel FM-OT-stoch uniform yes 0.10 Stiefel FM-ln logit-normal no 0.00 Stiefel FM-ln-OT logit-normal yes 0.00