Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

NExT-Mol: 3D Diffusion Meets 1D Language Modeling for 3D Molecule Generation

Authors: Zhiyuan Liu, Yanchen Luo, Han Huang, Enzhi Zhang, Sihang Li, Junfeng Fang, Yaorui SHI, Xiang Wang, Kenji Kawaguchi, Tat-Seng Chua

ICLR 2025 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this section, we evaluate NEx T-Mol s performance on de novo 3D molecule generation and conditional 3D molecule generation. Further, we report results of 3D conformer prediction, the critical second step in our two-step generation process. Finally, we present ablation studies to demonstrate the effectiveness of each component of NEx T-Mol.
Researcher Affiliation	Academia	1 National University of Singapore, 2 University of Science and Technology of China, 3 Chinese University of Hong Kong, 4 Hokkaido University EMAIL, EMAIL
Pseudocode	Yes	Algorithm 1 Training Algorithm 2 Sampling 3D Conformers
Open Source Code	Yes	Our codes and pretrained checkpoints are available at https://github.com/acharkq/NEx T-Mol.
Open Datasets	Yes	Datasets. As Table 1 shows, we evaluate on the popular GEOM-DRUGS (Axelrod & Gomez-Bombarelli, 2022), GEOM-QM9 (Axelrod & Gomez-Bombarelli, 2022), and QM92014 (Ramakrishnan et al., 2014) datasets. Among them, we focus on GEOM-DRUGS, which is the most pharmaceutically relevant and largest one. Due to different tasks incorporating different dataset splits, we separately fine-tune NEx T-Mol for each task without sharing weights.
Dataset Splits	Yes	Evaluation. Following (Wang et al., 2024; Jing et al., 2022), we use the dataset split of 243473/30433/1000 for GEOM-DRUGS and 106586/13323/1000 for GEOM-QM9, provided by (Ganea et al., 2021).
Hardware Specification	Yes	The training was done on 4 NVIDIA A100-40G GPUs and took approximately two weeks.
Software Dependencies	No	The paper mentions software components like Flash-Attention, FSDP, SELFIES, and RDKit, but does not provide specific version numbers for any of them in the text.
Experiment Setup	Yes	Table 17: Hyperparameter for pretraining Mo Llama. Table 18: Hyperparameters of the DMT-B and DMT-L models. DMT Settings. We use a dropout rate of 0.1 for QM9-2014 and 0.05 for GEOM-DRUGS. Following (Huang et al., 2024), we select only the conformer with the lowest energy for training on the GEOM-DRUGS dataset. For both datasets, we train DMT-B for 1000 epochs. The batch size for QM9-2014 is 2048 and the batch size for GEOM-DRUGS is 256.