MolDiff: Addressing the Atom-Bond Inconsistency Problem in 3D Molecule Diffusion Generation

Authors: Xingang Peng, Jiaqi Guan, Qiang Liu, Jianzhu Ma

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental The empirical studies showed that our model outperforms previous approaches, achieving a three-fold improvement in success rate and generating molecules with significantly better quality.
Researcher Affiliation Academia 1School of Intelligence Science and Techology, Peking University, Beijing, China 2Institute for Artifical Intelligence, Peking University, Beijing, China 3Department of Computer Science, University of Illinois Urbana-Champaign, Champaign, USA 4University of Texas at Austin, Texas, USA 5Institute for AI Industry Research, Tsinghua University, Beijing, China.
Pseudocode No No structured pseudocode or algorithm blocks were found.
Open Source Code Yes The source codes will be provided at https://github.com/pengxingang/MolDiff.
Open Datasets Yes We utilized the GEOM-Drug dataset to train and assess our models, and included details about the data preprocessing in Appendix B. We downloaded the GEOM-Drug from the database website (Axelrod & G omez-Bombarelli, 2022).
Dataset Splits Yes After filtering, we removed the hydrogen atoms and constructed the training, validation, and testing datasets with 231523, 28941, and 28940 molecules, respectively.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud instance types) used for running experiments were provided.
Software Dependencies No The paper mentions tools and optimizers like "RDKit", "Open Babel", and "Adam W optimizer", but no specific version numbers for these or other software dependencies are provided.
Experiment Setup Yes We set the embedding dimensions of node types and edge types as 256 and 64, respectively and all intermediate hidden dimensions are constant. The time embedding dimensions are 10. The graph neural networks contain six layers. We trained the diffusion network using Adam W optimizer with a learning rate 1 10 4 and batch size 256 for 110, 000 iterations. For the weights of the atom loss and bond loss, i.e., λ1 and λ2, we set λ1 = λ2 = 100 so that the losses of atom types, atom positions, and bond types were almost in the same magnitude. In our implementation, we chose the parameters s1 = 0.9999, s T = 0.0001, w = 3 for atom types and atom positions for the whole diffusion process t [1, T]. For the bond type, we used s1 = 0.9999, s T = 0.001, w = 3 during diffusion steps [1, 600] in the first stage and s1 = 0.001, s T = 0.0001, w = 2 during steps [600, 1000] in the second stage.