Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Accelerating 3D Molecule Generative Models with Trajectory Diagnosis

Authors: Zhilong Zhang, Yuxuan Song, Yichun Wang, Jingjing Gong, Hanlin Wu, Dongzhan Zhou, Hao Zhou, Wei-Ying Ma

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that our approach achieves competitive performance with approximately 10 sampling steps, 7.5 faster than previous state-of-the-art models and approximately 100 faster than diffusion-based models, offering a significant step towards scalable molecular generation. Code is available at https://github.com/Gen SI-THUAIR/Mol TD
Researcher Affiliation Collaboration Zhilong Zhang1,2 Yuxuan Song1,3 Yichun Wang4 Jing Jing Gong1 Hanlin Wu1 Dongzhan Zhou5 Hao Zhou1,5 & Wei-Ying Ma1 1 Institute of AI Industry Research (AIR), Tsinghua University 2 Qiuzhen College, Tsinghua University 3 Department of Department of Computer Science and Technology, Tsinghua University 4 Department of Industrial Engineering, Tsinghua University 5 Shanghai Artificial Intelligence Laboratory
Pseudocode Yes Algorithm 1 Construction of a Geometric-Informed Prior Input: A set of M molecules {gn}M n=1 of the same size, and accuracy level αp. Output: The geometric-informed prior θp. Initialize: Set the reference molecule gref = g1, and initialize the aligned molecules list Mol = [gref]. for i = 2 to M do Compute the optimal permutation and rotation using the optimal transport (EOT) objective: π i , R i = argmin π,R π(Rgi) gref 2 Append the aligned molecule to the list: Mol.append(π (R gi)) end for Extract Information: Compute the average of the aligned molecules: g = Mean(Mol) Project to Parameter Space: θp = E p S(y| g,αp)δ(θαp h(y, 0, αp))
Open Source Code Yes Code is available at https://github.com/Gen SI-THUAIR/Mol TD
Open Datasets Yes We demonstrate the effectiveness of the proposed methods on two molecule datasets: QM9 [12] and GEOM-DRUG [13]. For the first time, our approach enables the generation of large molecules using approximately 10 steps while reaching new state-of-the-art performance on both datasets: on QM9, MOLTD achieves molecule stability of 93.16%, and on GEOMDRUG it achieves Atom stability of 86.88% .
Dataset Splits Yes And the data configurations directly follow previous works [43, 41, 30, 31].
Hardware Specification Yes The training process requires approximately 2000 epochs for QM9 and 20 epochs for DRUG using RTX 3090.
Software Dependencies No We implement the Bayesian Flow Network using EGNNs [47] within the Py Torch framework [48]. The latent invariant feature dimension k is set to 1 for QM9 and 2 for DRUG, significantly reducing the atomic feature dimensionality.
Experiment Setup Yes The latent invariant feature dimension k is set to 1 for QM9 and 2 for DRUG, significantly reducing the atomic feature dimensionality. Following the implementations of Geo BFN [31], we only take atom charges as atomic features. For training the parameter network Φ, we configure EGNNs with 9 layers and 256 hidden features for QM9, and 6 layers with 256 hidden features for DRUG, both trained with a batch size of 64. The model employs Si LU activations and is trained until convergence. Across all experiments, we adopt the Adam optimizer [49] with a fixed learning rate of 10 4 as the default training configuration.