reproducibilityindex.ai

MolDiff: Addressing the Atom-Bond Inconsistency Problem in 3D Molecule Diffusion Generation

Authors: Xingang Peng, Jiaqi Guan, Qiang Liu, Jianzhu Ma

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The empirical studies showed that our model outperforms previous approaches, achieving a three-fold improvement in success rate and generating molecules with significantly better quality.
Researcher Affiliation	Academia	1School of Intelligence Science and Techology, Peking University, Beijing, China 2Institute for Artifical Intelligence, Peking University, Beijing, China 3Department of Computer Science, University of Illinois Urbana-Champaign, Champaign, USA 4University of Texas at Austin, Texas, USA 5Institute for AI Industry Research, Tsinghua University, Beijing, China.
Pseudocode	No	No structured pseudocode or algorithm blocks were found.
Open Source Code	Yes	The source codes will be provided at https://github.com/pengxingang/MolDiff.
Open Datasets	Yes	We utilized the GEOM-Drug dataset to train and assess our models, and included details about the data preprocessing in Appendix B. We downloaded the GEOM-Drug from the database website (Axelrod & G omez-Bombarelli, 2022).
Dataset Splits	Yes	After filtering, we removed the hydrogen atoms and constructed the training, validation, and testing datasets with 231523, 28941, and 28940 molecules, respectively.
Hardware Specification	No	No specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud instance types) used for running experiments were provided.
Software Dependencies	No	The paper mentions tools and optimizers like "RDKit", "Open Babel", and "Adam W optimizer", but no specific version numbers for these or other software dependencies are provided.
Experiment Setup	Yes	We set the embedding dimensions of node types and edge types as 256 and 64, respectively and all intermediate hidden dimensions are constant. The time embedding dimensions are 10. The graph neural networks contain six layers. We trained the diffusion network using Adam W optimizer with a learning rate 1 10 4 and batch size 256 for 110, 000 iterations. For the weights of the atom loss and bond loss, i.e., λ1 and λ2, we set λ1 = λ2 = 100 so that the losses of atom types, atom positions, and bond types were almost in the same magnitude. In our implementation, we chose the parameters s1 = 0.9999, s T = 0.0001, w = 3 for atom types and atom positions for the whole diffusion process t [1, T]. For the bond type, we used s1 = 0.9999, s T = 0.001, w = 3 during diffusion steps [1, 600] in the first stage and s1 = 0.001, s T = 0.0001, w = 2 during steps [600, 1000] in the second stage.