Protein 3D Graph Structure Learning for Robust Structure-Based Protein Property Prediction
Authors: Yufei Huang, Siyuan Li, Lirong Wu, Jin Su, Haitao Lin, Odin Zhang, Zihan Liu, Zhangyang Gao, Jiangbin Zheng, Stan Z. Li
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments have shown that our framework is model-agnostic and effective in improving the property prediction of both predicted structures and experimental structures. In this section, we first introduce the experimental setup for four standard protein property prediction tasks... We then conduct empirical experiments to demonstrate the effectiveness of the proposed framework SAO. We aim to answer five research questions as follows: Q1: Does the protein structure embedding bias problem generally exist in various protein property prediction tasks across different predictors? Q2: Are our proposed framework SAO model-agnostic? Q3: How effective is SAO for PGSL-RPA? Q4: How do key framework components impact the performance of SAO? Q5: How robust is SAO to less accurate or non Alpha Fold predicted protein structures? |
| Researcher Affiliation | Academia | Yufei Huang1,2, Siyuan Li 1,2, Lirong Wu1,2, Jin Su1,2, Haitao Lin1,2, Odin Zhang1, Zihan Liu1,2, Zhangyang Gao1,2, Jiangbin Zheng1,2, Stan Z. Li2* 1 Zhejiang University, Hangzhou 2 AI Lab, Research Center for Industries of the Future, Westlake University huangyufei, lisiyuan, sujin, wulirong, linhaitao, zhengjiangbin, Stan.ZQ.Li@westlake.edu.cn, haotianzhang@zju.edu.cn |
| Pseudocode | No | The paper states 'The algorithmic description and more details, including the mask ratio, are provided in Appendix E.' However, Appendix E is not included in the provided document, so pseudocode is not present within this text. |
| Open Source Code | No | The paper does not explicitly state that source code for the described methodology is being released, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We adopt four tasks proposed in (Gligorijevi c et al. 2021) as downstream tasks for evaluation. Enzyme Commission (EC) number prediction aims to forecast the EC numbers of various proteins... Gene Ontology(GO) term prediction aims to annotate a protein with GO terms... Following (Gligorijevi c et al. 2021), we use the multicutoff split methods for EC and GO tasks to guarantee the test set contains only PDB chains with sequence identity less than 95% to the training set. |
| Dataset Splits | Yes | Following (Gligorijevi c et al. 2021), we use the multicutoff split methods for EC and GO tasks to guarantee the test set contains only PDB chains with sequence identity less than 95% to the training set. We pretrain encoders under SAO for 400 epochs on structures of corresponding downstream tasks and then fine-tune them on downstream tasks for specific epochs (EC: 100 epochs, GO-CC: 45 epochs, GO-MF and GO-BP: 100 epochs). |
| Hardware Specification | No | The paper mentions 'Implementation details can be referred to Appendix' and 'For other experimental details, interested readers can refer to the Appendix' but does not specify any hardware details like GPU models, CPU types, or memory used for the experiments within the provided text. |
| Software Dependencies | No | The paper does not provide specific version numbers for software dependencies such as libraries, frameworks (e.g., PyTorch, TensorFlow), or programming languages used in the experiments. |
| Experiment Setup | Yes | We pretrain encoders under SAO for 400 epochs on structures of corresponding downstream tasks and then fine-tune them on downstream tasks for specific epochs (EC: 100 epochs, GO-CC: 45 epochs, GO-MF and GO-BP: 100 epochs). Warmup and the exponential learning rate decay schedule are used with a start learning rate of 0.0, a max learning rate of 1e-4, and a decay factor of 0.99. |