3D Equivariant Diffusion for Target-Aware Molecule Generation and Affinity Prediction
Authors: Jiaqi Guan, Wesley Wei Qian, Xingang Peng, Yufeng Su, Jian Peng, Jianzhu Ma
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | To evaluate our model, we propose a comprehensive framework to evaluate the quality of sampled molecules from different dimensions. Empirical studies show our model could generate molecules with more realistic 3D structures and better affinities towards the protein targets, and improve binding affinity ranking and prediction without retraining. |
| Researcher Affiliation | Academia | Jiaqi Guan1 , Wesley Wei Qian1 , Xingang Peng2, Yufeng Su1, Jian Peng1, Jianzhu Ma3 1 Department of Computer Science, University of Illinois Urbana-Champaign 2 School of Intelligence Science and Technology, Peking University 3 Institute for AI industry Research, Tsinghua University {jiaqi, weiqian3, jianpeng}@illinois.edu, majianzhu@air.tsinghua.edu.cn |
| Pseudocode | Yes | We summarize the overall training and sampling procedure of Target Diff in Appendix E. Algorithm 1 Training Procedure of Target Diff; Algorithm 2 Sampling Procedure of Target Diff |
| Open Source Code | Yes | The model implementation, experimental data and model checkpoints can be found here: https://github.com/guanjq/targetdiff |
| Open Datasets | Yes | We use Cross Docked2020 (Francoeur et al., 2020) to train and evaluate Target Diff. |
| Dataset Splits | No | The paper mentions '100,000 complexes for training and 100 novel complexes as references for testing' for the Cross Docked2020 dataset, and refers to 'validation loss' during training, but does not provide explicit size or percentage for a validation set split. |
| Hardware Specification | Yes | We trained our model on one NVIDIA Ge Force GTX 3090 GPU, and it could converge within 24 hours and 200k steps. |
| Software Dependencies | No | The paper mentions 'Open Babel' and 'Adam' but does not provide specific version numbers for software dependencies or other key libraries. |
| Experiment Setup | Yes | The model is trained via gradient descent method Adam Kingma and Ba (2014) with init learning rate=0.001, betas=(0.95, 0.999), batch size=4 and clip gradient norm=8. We multiply a factor α = 100 on the atom type loss to balance the scales of two losses. During the training phase, we add a small Gaussian noise with a standard deviation of 0.1 to protein atom coordinates as data augmentation. We also schedule to decay the learning rate exponentially with a factor of 0.6 and a minimum learning rate of 1e-6. The learning rate is decayed if there is no improvement for the validation loss in 10 consecutive evaluations. The evaluation is performed for every 2000 training steps. |