reproducibilityindex.ai

Graph Denoising Diffusion for Inverse Protein Folding

Authors: Kai Yi, Bingxin Zhou, Yiqing Shen, Pietro Lió, Yuguang Wang

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	4 Experiments We validate our GRADE-IF on recovering native protein sequences in CATH [30]. The performance is mainly compared with structure-aware SOTA models. The implementations for the main algorithms (see Appendix D) at https://github.com/ykiiiiii/Gra De_IF are programmed with Py Torch-Geometric (ver 2.2.0) and Py Torch (ver 1.12.1) and executed on an NVIDIA Tesla V100 GPU with 5, 120 CUDA cores and 32GB HBM2 installed on an HPC cluster.
Researcher Affiliation	Academia	Kai Yi University of New South Wales kai.yi@unsw.edu.au Bingxin Zhou Shanghai Jiao Tong University bingxin.zhou@sjtu.edu.cn Yiqing Shen Johns Hopkins University yshen92@jhu.edu Pietro Liò University of Cambridge pl219@cam.ac.uk Yu Guang Wang Shanghai Jiao Tong University University of New South Wales yuguang.wang@sjtu.edu.cn
Pseudocode	Yes	Algorithm 1 Training ... Algorithm 2 Sampling (DDPM) ... Algorithm 3 Sampling (DDIM) ... Algorithm 4 Partial Sampling
Open Source Code	Yes	The code is available on https://github.com/ykiiiiii/Gra De_IF.
Open Datasets	Yes	We employ CATH v4.2.0-based partitioning as conducted by GRAPHTRANS [18] and GVP [20]. Proteins are categorized based on CATH topology classification, leading to a division of 18, 024 proteins for training, 608 for validation, and 1, 120 for testing. In addition to the CATH dataset, we also evaluated our model using the TS50 and T500 datasets. These datasets were introduced by Dense CPD [32], encompassing 9888 structures for training, and two distinct test datasets comprising 50 (TS50) and 500 (T500) test datasets, respectively.
Dataset Splits	Yes	Proteins are categorized based on CATH topology classification, leading to a division of 18, 024 proteins for training, 608 for validation, and 1, 120 for testing.
Hardware Specification	Yes	The implementations for the main algorithms (see Appendix D) at https://github.com/ykiiiiii/Gra De_IF are programmed with Py Torch-Geometric (ver 2.2.0) and Py Torch (ver 1.12.1) and executed on an NVIDIA Tesla V100 GPU with 5, 120 CUDA cores and 32GB HBM2 installed on an HPC cluster.
Software Dependencies	Yes	The implementations for the main algorithms (see Appendix D) at https://github.com/ykiiiiii/Gra De_IF are programmed with Py Torch-Geometric (ver 2.2.0) and Py Torch (ver 1.12.1)
Experiment Setup	Yes	The total time step of the diffusion model is configured as 500, adhering to a cosine schedule for noise [27]. For the denoising network, we implement six stacked EGNN blocks, each possessing a hidden dimension of 128. Our model undergoes training for default of 200 epochs, making use of the Adam optimizer. A batch size of 64 and a learning rate of 0.0005 are applied during training. Moreover, to prevent overfitting, we incorporate a dropout rate of 0.1 into our model s architecture.