Full-Atom Protein Pocket Design via Iterative Refinement
Authors: ZAIXI ZHANG, Zepu Lu, Hao Zhongkai, Marinka Zitnik, Qi Liu
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments show that FAIR surpasses existing methods in designing superior pocket sequences and structures, producing average improvement exceeding 10% in AAR and RMSD metrics. |
| Researcher Affiliation | Academia | Zaixi Zhang1,2,4, Zepu Lu1,2, Zhongkai Hao3, Marinka Zitnik4, Qi Liu1,2 1 Anhui Province Key Lab of Big Data Analysis and Application, School of Computer Science and Technology, University of Science and Technology of China 2 State Key Laboratory of Cognitive Intelligence, Hefei, Anhui, China 3 Dept. of Comp. Sci. and Tech., Institute for AI, THBI Lab, BNRist Center, Tsinghua-Bosch Joint ML Center, Tsinghua, 4 Harvard University {zaixi, zplu}@mail.ustc.edu.cn, hzj21@mails.tsinghua.edu.cn, marinka@hms.harvard.edu, qiliuql@ustc.edu.cn |
| Pseudocode | Yes | Algorithm 1 and 2 in Appendix A outline model training and sampling. |
| Open Source Code | No | The paper provides links to the source code for baseline methods (e.g., Pocket Optimizer, DEPACT, HSRN, Diffusion, MEAN) in Section B, but does not provide a concrete link or explicit statement about the public availability of the source code for their own method, FAIR. |
| Open Datasets | Yes | We consider two widely used datasets for experimental evaluations. Cross Docked dataset [16] contains 22.5 million protein-molecule pairs generated through cross-docking. ... Binding MOAD dataset [21] contains around 41k experimentally determined protein-ligand complexes. |
| Dataset Splits | Yes | For data splitting, we use mmseqs2 [58] to cluster data at 30% sequence identity, and randomly draw 100k protein-ligand structure pairs for training and 100 pairs from the remaining clusters for testing and validation, respectively. ... We further filter and split the Binding MOAD dataset based on the proteins enzyme commission number [4], resulting in 40k protein-ligand pairs for training, 100 pairs for validation, and 100 pairs for testing following previous work [54]. |
| Hardware Specification | Yes | To evaluate the efficiency of FAIR, we considered the generation time of different approaches using a single V100 GPU on the same machine. |
| Software Dependencies | No | The paper mentions software components such as 'Adam optimizer', 'AMBER ff14S force field', 'Dunbrack rotamer library', 'Open MM', 'QVina', and 'Rosetta', but does not provide specific version numbers for these software dependencies. |
| Experiment Setup | Yes | We train them for 50 epochs and select the checkpoint with the lowest loss on the validation set for testing. We use the Adam optimizer with a learning rate of 0.0001 for optimization. In FAIR, the default setting sets T1 and T2 as 5 and 10. The number of layers for the atom and residue-level encoder are 6 and 2, respectively. Ka and Kr are set as 24 and 8 respectively. The number of attention heads is set as 4; The hidden dimension d is set as 128. The standard deviation of the Gaussian noise added to the ligand coordinates in model training is 0.1. |