Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Accelerating 3D Molecule Generative Models with Trajectory Diagnosis
Authors: Zhilong Zhang, Yuxuan Song, Yichun Wang, Jingjing Gong, Hanlin Wu, Dongzhan Zhou, Hao Zhou, Wei-Ying Ma
NeurIPS 2025 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments demonstrate that our approach achieves competitive performance with approximately 10 sampling steps, 7.5 faster than previous state-of-the-art models and approximately 100 faster than diffusion-based models, offering a significant step towards scalable molecular generation. Code is available at https://github.com/Gen SI-THUAIR/Mol TD |
| Researcher Affiliation | Collaboration | Zhilong Zhang1,2 Yuxuan Song1,3 Yichun Wang4 Jing Jing Gong1 Hanlin Wu1 Dongzhan Zhou5 Hao Zhou1,5 & Wei-Ying Ma1 1 Institute of AI Industry Research (AIR), Tsinghua University 2 Qiuzhen College, Tsinghua University 3 Department of Department of Computer Science and Technology, Tsinghua University 4 Department of Industrial Engineering, Tsinghua University 5 Shanghai Artificial Intelligence Laboratory |
| Pseudocode | Yes | Algorithm 1 Construction of a Geometric-Informed Prior Input: A set of M molecules {gn}M n=1 of the same size, and accuracy level αp. Output: The geometric-informed prior θp. Initialize: Set the reference molecule gref = g1, and initialize the aligned molecules list Mol = [gref]. for i = 2 to M do Compute the optimal permutation and rotation using the optimal transport (EOT) objective: π i , R i = argmin π,R π(Rgi) gref 2 Append the aligned molecule to the list: Mol.append(π (R gi)) end for Extract Information: Compute the average of the aligned molecules: g = Mean(Mol) Project to Parameter Space: θp = E p S(y| g,αp)δ(θαp h(y, 0, αp)) |
| Open Source Code | Yes | Code is available at https://github.com/Gen SI-THUAIR/Mol TD |
| Open Datasets | Yes | We demonstrate the effectiveness of the proposed methods on two molecule datasets: QM9 [12] and GEOM-DRUG [13]. For the first time, our approach enables the generation of large molecules using approximately 10 steps while reaching new state-of-the-art performance on both datasets: on QM9, MOLTD achieves molecule stability of 93.16%, and on GEOMDRUG it achieves Atom stability of 86.88% . |
| Dataset Splits | Yes | And the data configurations directly follow previous works [43, 41, 30, 31]. |
| Hardware Specification | Yes | The training process requires approximately 2000 epochs for QM9 and 20 epochs for DRUG using RTX 3090. |
| Software Dependencies | No | We implement the Bayesian Flow Network using EGNNs [47] within the Py Torch framework [48]. The latent invariant feature dimension k is set to 1 for QM9 and 2 for DRUG, significantly reducing the atomic feature dimensionality. |
| Experiment Setup | Yes | The latent invariant feature dimension k is set to 1 for QM9 and 2 for DRUG, significantly reducing the atomic feature dimensionality. Following the implementations of Geo BFN [31], we only take atom charges as atomic features. For training the parameter network Φ, we configure EGNNs with 9 layers and 256 hidden features for QM9, and 6 layers with 256 hidden features for DRUG, both trained with a batch size of 64. The model employs Si LU activations and is trained until convergence. Across all experiments, we adopt the Adam optimizer [49] with a fixed learning rate of 10 4 as the default training configuration. |