Coarse-to-Fine: a Hierarchical Diffusion Model for Molecule Generation in 3D

Authors: Bo Qiang, Yuxuan Song, Minkai Xu, Jingjing Gong, Bowen Gao, Hao Zhou, Wei-Ying Ma, Yanyan Lan

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments demonstrate that Hier Diff consistently improves the quality of molecule generation over existing methods.1
Researcher Affiliation Academia 1Department of Pharmaceutical Science, Peking University 2Institute for AI Industry Research (AIR), Tsinghua University 3Department of Computer Science, Stanford University. Correspondence to: Yanyan Lan <lanyanyan@tsinghua.edu.cn>.
Pseudocode Yes Algorithm 1 Training Algorithm for Node/edge decoding
Open Source Code Yes 1Code is available at https://github.com/ qiangbo1222/Hier Diff
Open Datasets Yes our main experiments are conducted on the dataset of GEOMDRUG (Axelrod & Gomez Bombarelli, 2022) and Cross Docked2020 (Francoeur et al., 2020).
Dataset Splits No The paper mentions training on 'GEOMDRUG' and 'Cross Docked2020' and refers to 'training data' and 'test set' in discussions and tables, but it does not provide specific dataset split percentages, absolute sample counts, or a detailed splitting methodology for training, validation, and test sets.
Hardware Specification No The paper mentions 'It took approximately 16 days to generate conformations for 400 different molecules on a 128-core server' but does not provide specific hardware details such as GPU models, CPU models, or memory specifications used for running its experiments.
Software Dependencies No The paper mentions software like 'RDkit', 'MMF field', 'MD software XTB', 'MD software CREST', and 'Merck Force Field (MFF)' but does not provide specific version numbers for these software components, which is required for reproducibility.
Experiment Setup Yes In GEOMDRUG experiments, we randomly selected 4 conformations of each molecule to train our model. The implicit hydrogen atoms are reconstructed using RDkit after all other heavy atoms are generated. ... Firstly, 50 initial conformations are generated for each molecule graph using RDkit and optimized by MMF field. Then, these conformations are further optimized by MD software XTB, while the energy terms are computed for each conformation. At last, we choose the conformation with the minimum energy to sample the ground truth conformations using MD software CREST. To balance efficiency and accuracy, we set the level of optimization to normal in the software for both energy computing and conformation sampling.