Interaction-based Retrieval-augmented Diffusion Models for Protein-specific 3D Molecule Generation
Authors: Zhilin Huang, Ling Yang, Xiangxin Zhou, Chujun Qin, Yijie Yu, Xiawu Zheng, Zikun Zhou, Wentao Zhang, Yu Wang, Wenming Yang
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirical studies on Cross Docked2020 dataset show IRDIFF can generate molecules with more realistic 3D structures and achieve stateof-the-art binding affinities towards the protein targets, while maintaining proper molecular properties. |
| Researcher Affiliation | Collaboration | 1Shenzhen International Graduate School, Tsinghua University 2Peng Cheng Laboratory 3Peking University 4School of Artificial Intelligence, University of Chinese Academy of Sciences 5China Southern Power Grid 6Xiamen University. |
| Pseudocode | Yes | Algorithm 1 Training Procedure of IRDIFF; Algorithm 2 Sampling Procedure of IRDIFF |
| Open Source Code | Yes | The codes and models are available at https://github.com/YangLing0818/IRDiff. |
| Open Datasets | Yes | To pretrain PMINet with binding affinity signals, we use the PDBbind v2016 dataset (Liu et al., 2015), which is most frequently used in bindingaffinity prediction tasks. As for molecular generation, following the previous work (Luo et al., 2021; Peng et al., 2022; Guan et al., 2023a), we train and evaluate IRDIFF on the Cross Docked2020 dataset (Francoeur et al., 2020). |
| Dataset Splits | No | The paper specifies training and testing sets, stating "This produces 100, 000 protein-ligand pairs for training and 100 proteins for testing." However, it does not explicitly provide details for a validation split. |
| Hardware Specification | Yes | We train PMINet on a single NVIDIA V100 GPU, and we use the Adam as our optimizer with learning rate 0.001, betas = (0.95, 0.999), batch size 16. The experiments are conducted on PDBBind v2016 dataset as mentioned in the main text. ... we train the parameterized diffusion denoising model of our IRDIFF on a single NVIDIA V100 GPU, and it could converge within 200k steps. |
| Software Dependencies | No | The paper mentions using "Adam as our optimizer" and "Auto Dock Vina" as a tool, but it does not specify version numbers for any software libraries, frameworks (like PyTorch or TensorFlow), or other key dependencies required for reproduction. |
| Experiment Setup | Yes | We use the Adam as our optimizer with learning rate 0.001, betas = (0.95, 0.999), batch size 4 and clipped gradient norm 8. We balance the atom type loss and atom position loss by multiplying a scaling factor λ = 100 on the atom type loss. We select the fixed sigmoid β schedule with β1 = 1e 7 and βT = 2e 3 as variance schedule for atom coordinates, and the cosine β schedule with s = 0.01 for atom types. The number of diffusion steps are set to 1000. In practice, we set the number of neighbors kn = 32. |