Graph Denoising Diffusion for Inverse Protein Folding
Authors: Kai Yi, Bingxin Zhou, Yiqing Shen, Pietro Lió, Yuguang Wang
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 4 Experiments We validate our GRADE-IF on recovering native protein sequences in CATH [30]. The performance is mainly compared with structure-aware SOTA models. The implementations for the main algorithms (see Appendix D) at https://github.com/ykiiiiii/Gra De_IF are programmed with Py Torch-Geometric (ver 2.2.0) and Py Torch (ver 1.12.1) and executed on an NVIDIA Tesla V100 GPU with 5, 120 CUDA cores and 32GB HBM2 installed on an HPC cluster. |
| Researcher Affiliation | Academia | Kai Yi University of New South Wales kai.yi@unsw.edu.au Bingxin Zhou Shanghai Jiao Tong University bingxin.zhou@sjtu.edu.cn Yiqing Shen Johns Hopkins University yshen92@jhu.edu Pietro Liò University of Cambridge pl219@cam.ac.uk Yu Guang Wang Shanghai Jiao Tong University University of New South Wales yuguang.wang@sjtu.edu.cn |
| Pseudocode | Yes | Algorithm 1 Training ... Algorithm 2 Sampling (DDPM) ... Algorithm 3 Sampling (DDIM) ... Algorithm 4 Partial Sampling |
| Open Source Code | Yes | The code is available on https://github.com/ykiiiiii/Gra De_IF. |
| Open Datasets | Yes | We employ CATH v4.2.0-based partitioning as conducted by GRAPHTRANS [18] and GVP [20]. Proteins are categorized based on CATH topology classification, leading to a division of 18, 024 proteins for training, 608 for validation, and 1, 120 for testing. In addition to the CATH dataset, we also evaluated our model using the TS50 and T500 datasets. These datasets were introduced by Dense CPD [32], encompassing 9888 structures for training, and two distinct test datasets comprising 50 (TS50) and 500 (T500) test datasets, respectively. |
| Dataset Splits | Yes | Proteins are categorized based on CATH topology classification, leading to a division of 18, 024 proteins for training, 608 for validation, and 1, 120 for testing. |
| Hardware Specification | Yes | The implementations for the main algorithms (see Appendix D) at https://github.com/ykiiiiii/Gra De_IF are programmed with Py Torch-Geometric (ver 2.2.0) and Py Torch (ver 1.12.1) and executed on an NVIDIA Tesla V100 GPU with 5, 120 CUDA cores and 32GB HBM2 installed on an HPC cluster. |
| Software Dependencies | Yes | The implementations for the main algorithms (see Appendix D) at https://github.com/ykiiiiii/Gra De_IF are programmed with Py Torch-Geometric (ver 2.2.0) and Py Torch (ver 1.12.1) |
| Experiment Setup | Yes | The total time step of the diffusion model is configured as 500, adhering to a cosine schedule for noise [27]. For the denoising network, we implement six stacked EGNN blocks, each possessing a hidden dimension of 128. Our model undergoes training for default of 200 epochs, making use of the Adam optimizer. A batch size of 64 and a learning rate of 0.0005 are applied during training. Moreover, to prevent overfitting, we incorporate a dropout rate of 0.1 into our model s architecture. |