MolDiff: Addressing the Atom-Bond Inconsistency Problem in 3D Molecule Diffusion Generation
Authors: Xingang Peng, Jiaqi Guan, Qiang Liu, Jianzhu Ma
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | The empirical studies showed that our model outperforms previous approaches, achieving a three-fold improvement in success rate and generating molecules with significantly better quality. |
| Researcher Affiliation | Academia | 1School of Intelligence Science and Techology, Peking University, Beijing, China 2Institute for Artifical Intelligence, Peking University, Beijing, China 3Department of Computer Science, University of Illinois Urbana-Champaign, Champaign, USA 4University of Texas at Austin, Texas, USA 5Institute for AI Industry Research, Tsinghua University, Beijing, China. |
| Pseudocode | No | No structured pseudocode or algorithm blocks were found. |
| Open Source Code | Yes | The source codes will be provided at https://github.com/pengxingang/MolDiff. |
| Open Datasets | Yes | We utilized the GEOM-Drug dataset to train and assess our models, and included details about the data preprocessing in Appendix B. We downloaded the GEOM-Drug from the database website (Axelrod & G omez-Bombarelli, 2022). |
| Dataset Splits | Yes | After filtering, we removed the hydrogen atoms and constructed the training, validation, and testing datasets with 231523, 28941, and 28940 molecules, respectively. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory amounts, or cloud instance types) used for running experiments were provided. |
| Software Dependencies | No | The paper mentions tools and optimizers like "RDKit", "Open Babel", and "Adam W optimizer", but no specific version numbers for these or other software dependencies are provided. |
| Experiment Setup | Yes | We set the embedding dimensions of node types and edge types as 256 and 64, respectively and all intermediate hidden dimensions are constant. The time embedding dimensions are 10. The graph neural networks contain six layers. We trained the diffusion network using Adam W optimizer with a learning rate 1 10 4 and batch size 256 for 110, 000 iterations. For the weights of the atom loss and bond loss, i.e., λ1 and λ2, we set λ1 = λ2 = 100 so that the losses of atom types, atom positions, and bond types were almost in the same magnitude. In our implementation, we chose the parameters s1 = 0.9999, s T = 0.0001, w = 3 for atom types and atom positions for the whole diffusion process t [1, T]. For the bond type, we used s1 = 0.9999, s T = 0.001, w = 3 during diffusion steps [1, 600] in the first stage and s1 = 0.001, s T = 0.0001, w = 2 during steps [600, 1000] in the second stage. |