DecompOpt: Controllable and Decomposed Diffusion Models for Structure-based Molecular Optimization

Authors: Xiangxin Zhou, Xiwei Cheng, Yuwei Yang, Yu Bao, Liang Wang, Quanquan Gu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 EXPERIMENTS
Researcher Affiliation Collaboration 1School of Artificial Intelligence, University of Chinese Academy of Sciences 2Center for Research on Intelligent Perception and Computing (CRIPAC), State Key Laboratory of Multimodal Artificial Intelligence Systems (MAIS), Institute of Automation, Chinese Academy of Sciences (CASIA) 3Byte Dance Research 4Halıcıo glu Data Science Institute, University of California San Diego
Pseudocode Yes Algorithm 1 Optimization Process
Open Source Code No The paper does not provide a direct statement or link to open-source code for the methodology described.
Open Datasets Yes Dataset We utilized the Cross Docked2020 dataset (Francoeur et al., 2020) to train and evaluate our model.
Dataset Splits No The paper mentions 'validation loss' and that 'The evaluation is performed for every 1000 training steps. We utilized the Cross Docked2020 dataset (Francoeur et al., 2020) to train and evaluate our model. Additionally, we adopted the same filtering and splitting strategies as the previous work (Luo et al., 2021; Peng et al., 2022; Guan et al., 2023a). The strategy focuses on retaining high-quality complexes (RMSD < 1 A) and diverse proteins (sequence identity < 30%), leading to 100, 000 protein-binding complexes for training and 100 novel protein for testing.' It does not specify the size or percentage of the validation split.
Hardware Specification Yes We trained our model on one NVIDIA Ge Force GTX A100 GPU, and it could converge within 237k steps.
Software Dependencies No The paper mentions 'Open Babel' and 'rdkit functions' but does not specify their version numbers or other key software dependencies with specific versions.
Experiment Setup Yes We use Adam Kingma & Ba (2014) with init learning rate=0.0005, betas=(0.95, 0.999) to train the model. And we set batch size=16 and clip gradient norm=8. ... We also schedule to decay the learning rate exponentially with a factor of 0.6 and a minimum learning rate of 1e-6. ... we set the number of diffusion steps as 1000. For this diffusion noise schedule, we choose to use a sigmoid β schedule with β1 = 1e-7 and βT = 2e-3 for atom coordinates, and a cosine β schedule suggested in Nichol & Dhariwal (2021) with s = 0.01 for atom types and bond types.