Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Sampling 3D Molecular Conformers with Diffusion Transformers

Authors: J. Thorben Frank, Winfried Ripken, Gregor Lied, Klaus-Robert Müller, Oliver Unke, Stefan Chmiela

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments on standard conformer generation benchmarks (GEOMQM9, -DRUGS, -XL) demonstrate that Di TMC achieves state-of-the-art precision and physical validity. Our results highlight how architectural choices and symmetry priors affect sample quality and efficiency, suggesting promising directions for large-scale generative modeling of molecular structures.
Researcher Affiliation	Collaboration	1Technical University Berlin 2BIFOLD Berlin 3Google Deep Mind 4MPI for Informatics, Saarbrücken 5Department of Artificial Intelligence, Korea University
Pseudocode	Yes	Algorithm 1 describes the computation of the training loss for our flow matching objective. We start by sampling from the prior x0 p0(x), the data distribution x1 p1(x), and a Gaussian distribution ϵ N(x; 0, I). ... Algorithm 2 ODE Sampling
Open Source Code	Yes	Code is available at https://github.com/ML4Mol Sim/dit_mc.
Open Datasets	Yes	We conduct our experiments on the GEOM dataset [62], comprising QM9 (133,258 small molecules) and AICures (304,466 drug-like molecules). Reference conformers are generated using CREST [63]. ... using the GEOM-XL dataset [46].
Dataset Splits	Yes	The data splits are taken from Ref. [64]. ... We use the train/test/val split from Geomol [64], using the same 1000 molecules for testing.
Hardware Specification	Yes	All models on GEOM-QM9 are trained for 250 epochs, which requires 2 days of training on Nvidia H100 GPU for a PE and r PE models and almost 4 days for PE(3). For GEOM-DRUGS, we fix the total compute budget per model to nine days on a single NVIDIA H100 GPU, due to computational constraints.
Software Dependencies	No	The paper mentions software like RDKit [73] and GFN2-x TB [75], but it does not specify exact version numbers for these or any other core software dependencies (e.g., Python, PyTorch, CUDA) required for replication.
Experiment Setup	Yes	Optimizer and Hyperparameters We use the Adam W optimizer (weight decay 0.01) with batch size of 128 and learning rate of µmax = 3 10 4 for GEOM-QM9 and µmax = 1 10 4 for GEOM-DRUGS. First, we increase the initial learning rate of µ0 = 10 5 up to µmax via a linear learning rate warmup over the first 1% of training steps. Afterwards, the learning rate is decreased via a cosine decay schedule to µmin = 0 for GEOM-QM9 and µmin = 1 10 5 for GEOM-DRUGS.