MolCRAFT: Structure-Based Drug Design in Continuous Parameter Space

Authors: Yanru Qu, Keyue Qiu, Yuxuan Song, Jingjing Gong, Jiawei Han, Mingyue Zheng, Hao Zhou, Wei-Ying Ma

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirical results show that our model consistently achieves superior performance in binding affinity with more stable 3D structure, demonstrating our ability to accurately model interatomic interactions. We conduct comprehensive evaluation under controlled molecular sizes. Experiments show that our model generates high-affinity binders with feasible 3D poses.
Researcher Affiliation Collaboration Yanru Qu * 1 2 Keyue Qiu * 1 3 Yuxuan Song * 1 3 Jingjing Gong 1 Jiawei Han 2 Mingyue Zheng 4 Hao Zhou 1 Wei-Ying Ma 1 ... 1Institute for AI Industry Research (AIR), Tsinghua University 2University of Illinois Urbana Champaign, USA 3Department of Computer Science and Technology, Tsinghua University 4Shanghai Institute of Materia Medica, Chinese Academy of Sciences.
Pseudocode Yes Algorithm 1 Discrete-Time Loss and Algorithm 2 Sampling are provided in Appendix A and B respectively.
Open Source Code Yes Code is available at https: //github.com/Algo Mole/Mol CRAFT.
Open Datasets Yes Dataset We use the Cross Docked dataset (Francoeur et al., 2020a) for training and testing, which originally contains 22.5 million protein-ligand pairs, and after the RMSD-based filtering and 30% sequence identity split by Luo et al. (2021), results in 100,000 training pairs and 100 test proteins.
Dataset Splits Yes Dataset We use the Cross Docked dataset (Francoeur et al., 2020a) for training and testing, which originally contains 22.5 million protein-ligand pairs, and after the RMSD-based filtering and 30% sequence identity split by Luo et al. (2021), results in 100,000 training pairs and 100 test proteins.
Hardware Specification Yes The training will converge within 15 epochs on a single RTX 3090, taking around 24 hours.
Software Dependencies No The paper mentions software components and libraries like 'PosNet3D', 'ReLU activation', 'Layer Normalization', and 'Adam optimizer', but it does not specify explicit version numbers for these software dependencies (e.g., PyTorch version, specific library versions).
Experiment Setup Yes For the noise schedules, we use β1 = 1.5 for atom types, σ1 = 0.03 for atom coordinates, and train the model with discrete time loss of 1000 training steps. For training, we use Adam optimizer with learning rate 0.005, batch size of 8, and exponential moving average of model parameters with a factor of 0.999.