Learning Complete Protein Representation by Dynamically Coupling of Sequence and Structure
Authors: Bozhen Hu, Cheng Tan, Jun Xia, Yue Liu, Lirong Wu, Jiangbin Zheng, Yongjie Xu, Yufei Huang, Stan Z. Li
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on benchmark datasets showcase that Couple Net outperforms state-of-the-art methods, exhibiting particularly superior performance in low-sequence similarities scenarios, adeptly identifying infrequently encountered functions and effectively capturing remote homology relationships in proteins. |
| Researcher Affiliation | Academia | Bozhen Hu1,2 , Cheng Tan2 , Jun Xia2, Yue Liu3, Lirong Wu2, Jiangbin Zheng2, Yongjie Xu2, Yufei Huang2, Stan Z. Li2 1Zhejiang University 2Westlake University 3National University of Singapore |
| Pseudocode | No | The paper describes the model architecture and message passing mechanism but does not include formal pseudocode or an algorithm block. |
| Open Source Code | Yes | Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material? Answer: [Yes] Justification: Codes are in the supplementary material. |
| Open Datasets | Yes | Following the tasks in IEconv [12] and Gear Net [14], we evaluate Couple Net on four protein tasks: protein fold classification, enzyme reaction classification, GO term prediction, and EC number prediction. Dataset statistics of our four downstream tasks are summarized in Table 4. |
| Dataset Splits | Yes | Dataset statistics of our four downstream tasks are summarized in Table 4. Table 4: Dataset statistics. #X means the number of X. Dataset #Train #Validation #Test EC 15, 550 1, 729 1, 919 GO 29, 898 3, 322 3, 415 Fold 12, 312 736 718 Superfamily 12, 312 736 1, 254 Family 12, 312 736 1, 272 Reaction Classification 29, 215 2, 562 5, 651 |
| Hardware Specification | Yes | The proposed models are trained with the Adam optimizer [54], which are conducted on a single NVIDIA-SMI A100 GPU through Py Torch 1.13+cu117 and Py Torch Geometric 2.3.1 with CUDA 11.2. |
| Software Dependencies | Yes | The proposed models are trained with the Adam optimizer [54], which are conducted on a single NVIDIA-SMI A100 GPU through Py Torch 1.13+cu117 and Py Torch Geometric 2.3.1 with CUDA 11.2. |
| Experiment Setup | Yes | The learning rate is set to 0.001. The radius threshold increases from 4 to 16, and l is set to be a constant number 11, and the number of feature channels is also doubled. ...There are four pooling layers that are sufficient to achieve satisfactory results; every two message-passing layers are followed by an average pooling layer. Thus, there are eight message-passing layers. ...As for the batch size and training epochs, etc., which influence the convergence speed of deep learning models, we also adopt the grid search method to get a group of values; the details are shown in Table 5. |