Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Straight-Line Diffusion Model for Efficient 3D Molecular Generation

Authors: Yuyan Ni, Shikun Feng, Haohan Chi, Bowen Zheng, Huan-ang Gao, Wei-Ying Ma, Zhi-Ming Ma, Yanyan Lan

NeurIPS 2025 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct extensive experiments to demonstrate the potential of straight-line diffusion in 3D molecular generation and other domains. As shown in Figure 1, using only at most 10 or 15 sampling steps, SLDM surpasses EDM or Equi FM, Geo BFN with 1000 sampling steps, achieving up to 100or 70-fold improvement in sampling efficiency. To validate the advantages of our method in molecular generation, we evaluate its overall performance and sampling efficiency in both unconditional and conditional generation scenarios.
Researcher Affiliation Academia 1 Academy of Mathematics and Systems Science, Chinese Academy of Sciences 2 Zhongguancun Institute of Artificial Intelligence, China 3 Institute for AI Industry Research (AIR), Tsinghua University 4 University of Chinese Academy of Sciences 5Huazhong University of Science and Technology 6Beijing Frontier Research Center for Biological Structure, Tsinghua University 7Beijing Academy of Artificial Intelligence Corresponding author: Yanyan Lan (EMAIL).
Pseudocode Yes The complete training and sampling procedure of straight-line diffusion are given in algorithm 1 and 2. The SLDM algorithms tailored for molecular generation are provided in Appendix B.
Open Source Code Yes 1The code is open-sourced at https://github.com/fengshikun/SLDM
Open Datasets Yes We evaluate our model using two widely adopted datasets for unconditional molecular generation, with all dataset splitting strictly following baseline settings [Hoogeboom et al., 2022, Song et al., 2024, 2023a]. QM9 [Ruddigkeit et al., 2012, Ramakrishnan et al., 2014] contains approximately 134,000 small organic molecules... GEOM-Drugs [Axelrod and Gomez-Bombarelli, 2022] focuses on drug-like molecules...
Dataset Splits Yes QM9 [Ruddigkeit et al., 2012, Ramakrishnan et al., 2014] contains approximately 134,000 small organic molecules with up to nine heavy atoms. It is split into training (100K), validation (18K), and test (13K) sets. GEOM-Drugs [Axelrod and Gomez-Bombarelli, 2022]... The dataset is randomly divided into training, validation, and test sets using an 8:1:1 ratio.
Hardware Specification Yes For QM9, it takes approximately 10 days on a single A100 GPU. For GEOM-drugs, it takes approximately 16 days on four A100 GPUs.
Software Dependencies No Optimizer Adam
Experiment Setup Yes The hyperparameter settings for molecular generation are detailed in Table 7. Settings follow Uni GEM [Feng et al., 2024], with two additional tunable hyperparameters introduced by our generative algorithm: the noise variance σ and the temperature annealing rate ν. Table 7: Network and training hyperparameters. Embedding size 256 for unconditional generation, 192 for conditional generation Layer number 9 for QM9, 4 for Geom-Drugs Shared layers 1 Batch size 64 for QM9, 128 for Geom-Drugs Train epoch 3000 for QM9, 32 for Geom-Drugs Learning rate 1.00 10 4 Optimizer Adam Sample steps T 10 1000 Nucleation time 10 Oversampling ratio 0.5 for each branch Loss weight 1 for each loss term Noise Variance σ 0.05 for unconditional generation, 0.1 for conditional generation Temperature Annealing Rate ν 0.5 for unconditional generation, 3 for conditional generation Non-uniform Discretization False if T > 13