3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction

Authors: Jiaqi Guan, Wesley Wei Qian, Xingang Peng, Yufeng Su, Jian Peng, Jianzhu Ma

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To evaluate our model, we propose a comprehensive framework to evaluate the quality of sampled molecules from different dimensions. Empirical studies show our model could generate molecules with more realistic 3D structures and better affinities towards the protein targets, and improve binding affinity ranking and prediction without retraining.
Researcher Affiliation Academia Jiaqi Guan1 , Wesley Wei Qian1 , Xingang Peng2, Yufeng Su1, Jian Peng1, Jianzhu Ma3 1 Department of Computer Science, University of Illinois Urbana-Champaign 2 School of Intelligence Science and Technology, Peking University 3 Institute for AI industry Research, Tsinghua University {jiaqi, weiqian3, jianpeng}@illinois.edu, majianzhu@air.tsinghua.edu.cn
Pseudocode Yes We summarize the overall training and sampling procedure of Target Diff in Appendix E. Algorithm 1 Training Procedure of Target Diff; Algorithm 2 Sampling Procedure of Target Diff
Open Source Code Yes The model implementation, experimental data and model checkpoints can be found here: https://github.com/guanjq/targetdiff
Open Datasets Yes We use Cross Docked2020 (Francoeur et al., 2020) to train and evaluate Target Diff.
Dataset Splits No The paper mentions '100,000 complexes for training and 100 novel complexes as references for testing' for the Cross Docked2020 dataset, and refers to 'validation loss' during training, but does not provide explicit size or percentage for a validation set split.
Hardware Specification Yes We trained our model on one NVIDIA Ge Force GTX 3090 GPU, and it could converge within 24 hours and 200k steps.
Software Dependencies No The paper mentions 'Open Babel' and 'Adam' but does not provide specific version numbers for software dependencies or other key libraries.
Experiment Setup Yes The model is trained via gradient descent method Adam Kingma and Ba (2014) with init learning rate=0.001, betas=(0.95, 0.999), batch size=4 and clip gradient norm=8. We multiply a factor α = 100 on the atom type loss to balance the scales of two losses. During the training phase, we add a small Gaussian noise with a standard deviation of 0.1 to protein atom coordinates as data augmentation. We also schedule to decay the learning rate exponentially with a factor of 0.6 and a minimum learning rate of 1e-6. The learning rate is decayed if there is no improvement for the validation loss in 10 consecutive evaluations. The evaluation is performed for every 2000 training steps.