Feature Reuse and Scaling: Understanding Transfer Learning with Protein Language Models

Authors: Francesca-Zhoufan Li, Ava P Amini, Yisong Yue, Kevin K Yang, Alex Xijie Lu

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We perform a systematic analysis of transfer learning using PLMs, conducting 370 experiments across a comprehensive suite of factors including different downstream tasks, architectures, model sizes, model depths, and pretraining time.
Researcher Affiliation Collaboration 1Department of Bioengineering, California Institute of Technology, California, USA 2Initial work was conducted while F.Z.L. was an intern at Microsoft. 3Microsoft Research, Massachusetts, USA 4Department of Computing and Mathematical Sciences, California Institute of Technology, California, USA.
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes Code for all experiments is available at https: //github.com/microsoft/protein-transfer
Open Datasets Yes We test a diverse set of tasks covering both property and structure prediction, different types of distribution shift relevant to protein engineering, and global versus local variation over the sequence (Supplementary section B.2). Structure prediction. We use the three-class secondary structure (SS3) task from TAPE with three independent test sets, SS3 CB513 (Cuff & Barton, 1999), SS3 TS115 (Yang et al., 2016), and SS3 CASP12 (Moult et al., 2018), where the objective is to predict whether each residue belongs to an α-helix, β-strand, or coil (Rao et al., 2019). Property prediction. We use the thermostability, subcellular localization, GB1, and AAV datasets from FLIP (Dallago et al., 2021).
Dataset Splits Yes Table A1: Summary of downstream prediction tasks [lists 'n Train sequences', 'n Val sequences', 'n Test sequences' for each task]. For the SS3 and subcellular localization tasks, we train linear classifiers with mini-batches in Py Torch and perform early stopping based on the validation set. For the regression tasks, we train ridge regression models with Scikit-learn (Buitinck et al., 2013), using a grid search on the validation set to tune the regularization strength.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions using 'Py Torch' and 'Scikit-learn' but does not specify their version numbers, which is required for reproducibility.
Experiment Setup Yes For the regression linear probes implemented in scikit-learn, we performed a grid search over alpha values (controlling regulation strength) of [1.e 03, 1.e 02, 1.e 01, 1.e + 00, 1.e + 01]. Output predicted fitness values are scaled with Standard Scaler(). For the classification linear probes implemented in Py Torch, we use Adam optimizer and set learning rate to 1.e 4 with a decay rate of 0.1. Batch size of 256 over 120 epoches was used for annotation tasks and a batch size of 120 over 100 epoches for the secondary structure tasks. We implement early stopping on validation loss with a tolerance of 10 (after a minimum of 5 epochs).