Evaluating Protein Transfer Learning with TAPE

Authors: Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Peter Chen, John Canny, Pieter Abbeel, Yun Song

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We benchmark a range of approaches to semi-supervised protein representation learning, which span recent work as well as canonical sequence learning techniques. We find that self-supervised pretraining is helpful for almost all models on all tasks, more than doubling performance in some cases. Table 2 contains results for all benchmarked architectures and training procedures on all downstream tasks in TAPE.
Researcher Affiliation Collaboration Roshan Rao* UC Berkeley roshan_rao@berkeley.edu Nicholas Bhattacharya* UC Berkeley nick_bhat@berkeley.edu Neil Thomas* UC Berkeley nthomas@berkeley.edu Yan Duan covariant.ai rocky@covariant.ai Xi Chen covariant.ai peter@covariant.ai John Canny UC Berkeley canny@berkeley.edu Pieter Abbeel UC Berkeley pabbeel@berkeley.edu Yun S. Song UC Berkeley yss@berkeley.edu
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Toward this end, all data and code used to run these experiments are available at https://github.com/songlab-cal/tape.
Open Datasets Yes Toward this end, all data and code used to run these experiments are available at https://github.com/songlab-cal/tape. We use Pfam [33], a database of thirty-one million protein domains used extensively in bioinformatics, as the pretraining corpus for TAPE. The data are from the Protein Net dataset [25].
Dataset Splits Yes We curate tasks into specific training, validation, and test splits to ensure that each task tests biologically relevant generalization that transfers to real-life scenarios. For the remaining data we construct training and test sets using a random 95/5% split.
Hardware Specification Yes All self-supervised models are trained on four NVIDIA V100 GPUs for one week.
Software Dependencies No The paper mentions software components and architectures like LSTM, Transformer, Res Net, and ELMo, but does not provide specific version numbers for any programming languages, libraries, or frameworks used for implementation.
Experiment Setup Yes We use a 12-layer Transformer with a hidden size of 512 units and 8 attention heads, leading to a 38M-parameter model. Hyperparameters for the other models were chosen to approximately match the number of parameters in the Transformer. Our LSTM consists of two three-layer LSTMs with 1024 hidden units corresponding to the forward and backward language models, whose outputs are concatenated in the final layer, similar to ELMo [5]. For the Res Net we use 35 residual blocks, each containing two convolutional layers with 256 filters, kernel size 9, and dilation rate 2.