KW-Design: Pushing the Limit of Protein Design via Knowledge Refinement
Authors: Zhangyang Gao, Cheng Tan, Xingran Chen, Yijie Zhang, Jun Xia, Siyuan Li, Stan Z. Li
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We extensively evaluate our proposed method on the CATH, TS50, TS500, and PDB datasets and our results show that our KW-Design method outperforms the previous Pi Fold method by approximately 9% on the CATH dataset. |
| Researcher Affiliation | Academia | 1 Zhejiang University; 2 Research Center for Industries of the Future, Westlake University 3 University of Michigan, USA; 4 Mc Gill University, Canada |
| Pseudocode | Yes | Algorithm 1 Memory Net Framework Usage: Retrieve embedding from the memory bank without the forward pass. |
| Open Source Code | Yes | The code is publicly available via Git Hub. |
| Open Datasets | Yes | We evaluate the performance of KW-Design on multiple datasets, including CATH4.2, CATH4.3, TS50, TS500, and PDB. The CATH4.2 dataset consists of 18,024 proteins for training, 608 proteins for validation, and 1,120 proteins for testing, following the same data splitting as Graph Trans (Ingraham et al., 2019), GVP (Jing et al., 2020), and Pi Fold (Gao et al., 2023). |
| Dataset Splits | Yes | The CATH4.2 dataset consists of 18,024 proteins for training, 608 proteins for validation, and 1,120 proteins for testing, following the same data splitting as Graph Trans (Ingraham et al., 2019), GVP (Jing et al., 2020), and Pi Fold (Gao et al., 2023). The CATH4.3 dataset includes 16,153 structures for the training set, 1,457 for the validation set, and 1,797 for the test set, following the same data splitting as ESMIF (Hsu et al., 2022). |
| Hardware Specification | Yes | The model is trained up to 20 epochs using the Adam optimizer on an NVIDIA V100. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' and 'Py Mol' as tools used but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | The model is trained up to 20 epochs using the Adam optimizer on an NVIDIA V100. The batch size and learning rate used for training are 32 and 0.001, respectively. |