reproducibilityindex.ai

OntoProtein: Protein Pretraining With Gene Ontology Embedding

Authors: Ningyu Zhang, Zhen Bi, Xiaozhuan Liang, Siyuan Cheng, Haosen Hong, Shumin Deng, Qiang Zhang, Jiazhang Lian, Huajun Chen

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that Onto Protein can surpass state-of-the-art methods with pre-trained protein language models in TAPE benchmark and yield better performance compared with baselines in protein-protein interaction and protein function prediction. [...] We conduct extensive experiments in widespread protein tasks, including TAPE benchmark, protein-protein interaction prediction, and protein function prediction, which demonstrate the effectiveness of our proposed approach.
Researcher Affiliation	Academia	1College of Computer Science and Technology, Zhejiang University 2School of Software Technology, Zhejiang University 3Alibaba-Zhejiang University Joint Research Institute of Frontier Technologies 4Hangzhou Innovation Center, Zhejiang University {zhangningyu,bizhen zju,liangxiaozhuan,22151070}@zju.edu.cn {231sm,12028071,jzlian,qiang.zhang.cs,huajunsir}@zju.edu.cn
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	Yes	Code and datasets are available in https://github.com/zjunlp/Onto Protein.
Open Datasets	Yes	Pre-training Dataset To incorporate Gene Ontology knowledge into language models, we build a new pre-training dataset called Protein KG256, which is a large-scale KG dataset with aligned descriptions and protein sequences respectively to GO terms7 and proteins entities. [...] Our code and datasets are all available in the https://github.com/zjunlp/ Onto Protein for reproducibility.
Dataset Splits	Yes	We deliver data splits for both the inductive and the transductive settings to promote future research. [...] We design two evaluation schemes, the transductive and the inductive settings, which simulate two scenarios of gene annotation in reality. [...] Table 6: Hyper-parameters for the downstream task.
Hardware Specification	Yes	We utilize Pytorch (Paszke et al. (2019)) to conduct experiments with Nvidia V100 GPUs.
Software Dependencies	No	We utilize Pytorch (Paszke et al. (2019)) to conduct experiments with Nvidia V100 GPUs.
Experiment Setup	Yes	This section details the training procedures and hyperparameters for each of the datasets. We utilize Pytorch (Paszke et al. (2019)) to conduct experiments with Nvidia V100 GPUs. In pre-training of Onto Protein, similar to Elnaggar et al. (2020), we use the same training protocol such as optimizer, learning rate schedule on BERT model. We set γ to 12.0 and the number of negative sampling to 128 in Equation 1. [...] Table 6: Hyper-parameters for the downstream task.