Graph Denoising Diffusion for Inverse Protein Folding

Authors: Kai Yi, Bingxin Zhou, Yiqing Shen, Pietro Lió, Yuguang Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental 4 Experiments We validate our GRADE-IF on recovering native protein sequences in CATH [30]. The performance is mainly compared with structure-aware SOTA models. The implementations for the main algorithms (see Appendix D) at https://github.com/ykiiiiii/Gra De_IF are programmed with Py Torch-Geometric (ver 2.2.0) and Py Torch (ver 1.12.1) and executed on an NVIDIA Tesla V100 GPU with 5, 120 CUDA cores and 32GB HBM2 installed on an HPC cluster.
Researcher Affiliation Academia Kai Yi University of New South Wales kai.yi@unsw.edu.au Bingxin Zhou Shanghai Jiao Tong University bingxin.zhou@sjtu.edu.cn Yiqing Shen Johns Hopkins University yshen92@jhu.edu Pietro Liò University of Cambridge pl219@cam.ac.uk Yu Guang Wang Shanghai Jiao Tong University University of New South Wales yuguang.wang@sjtu.edu.cn
Pseudocode Yes Algorithm 1 Training ... Algorithm 2 Sampling (DDPM) ... Algorithm 3 Sampling (DDIM) ... Algorithm 4 Partial Sampling
Open Source Code Yes The code is available on https://github.com/ykiiiiii/Gra De_IF.
Open Datasets Yes We employ CATH v4.2.0-based partitioning as conducted by GRAPHTRANS [18] and GVP [20]. Proteins are categorized based on CATH topology classification, leading to a division of 18, 024 proteins for training, 608 for validation, and 1, 120 for testing. In addition to the CATH dataset, we also evaluated our model using the TS50 and T500 datasets. These datasets were introduced by Dense CPD [32], encompassing 9888 structures for training, and two distinct test datasets comprising 50 (TS50) and 500 (T500) test datasets, respectively.
Dataset Splits Yes Proteins are categorized based on CATH topology classification, leading to a division of 18, 024 proteins for training, 608 for validation, and 1, 120 for testing.
Hardware Specification Yes The implementations for the main algorithms (see Appendix D) at https://github.com/ykiiiiii/Gra De_IF are programmed with Py Torch-Geometric (ver 2.2.0) and Py Torch (ver 1.12.1) and executed on an NVIDIA Tesla V100 GPU with 5, 120 CUDA cores and 32GB HBM2 installed on an HPC cluster.
Software Dependencies Yes The implementations for the main algorithms (see Appendix D) at https://github.com/ykiiiiii/Gra De_IF are programmed with Py Torch-Geometric (ver 2.2.0) and Py Torch (ver 1.12.1)
Experiment Setup Yes The total time step of the diffusion model is configured as 500, adhering to a cosine schedule for noise [27]. For the denoising network, we implement six stacked EGNN blocks, each possessing a hidden dimension of 128. Our model undergoes training for default of 200 epochs, making use of the Adam optimizer. A batch size of 64 and a learning rate of 0.0005 are applied during training. Moreover, to prevent overfitting, we incorporate a dropout rate of 0.1 into our model s architecture.