Feature Reuse and Scaling: Understanding Transfer Learning with Protein Language Models
Authors: Francesca-Zhoufan Li, Ava P Amini, Yisong Yue, Kevin K Yang, Alex Xijie Lu
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We perform a systematic analysis of transfer learning using PLMs, conducting 370 experiments across a comprehensive suite of factors including different downstream tasks, architectures, model sizes, model depths, and pretraining time. |
| Researcher Affiliation | Collaboration | 1Department of Bioengineering, California Institute of Technology, California, USA 2Initial work was conducted while F.Z.L. was an intern at Microsoft. 3Microsoft Research, Massachusetts, USA 4Department of Computing and Mathematical Sciences, California Institute of Technology, California, USA. |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code for all experiments is available at https: //github.com/microsoft/protein-transfer |
| Open Datasets | Yes | We test a diverse set of tasks covering both property and structure prediction, different types of distribution shift relevant to protein engineering, and global versus local variation over the sequence (Supplementary section B.2). Structure prediction. We use the three-class secondary structure (SS3) task from TAPE with three independent test sets, SS3 CB513 (Cuff & Barton, 1999), SS3 TS115 (Yang et al., 2016), and SS3 CASP12 (Moult et al., 2018), where the objective is to predict whether each residue belongs to an α-helix, β-strand, or coil (Rao et al., 2019). Property prediction. We use the thermostability, subcellular localization, GB1, and AAV datasets from FLIP (Dallago et al., 2021). |
| Dataset Splits | Yes | Table A1: Summary of downstream prediction tasks [lists 'n Train sequences', 'n Val sequences', 'n Test sequences' for each task]. For the SS3 and subcellular localization tasks, we train linear classifiers with mini-batches in Py Torch and perform early stopping based on the validation set. For the regression tasks, we train ridge regression models with Scikit-learn (Buitinck et al., 2013), using a grid search on the validation set to tune the regularization strength. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions using 'Py Torch' and 'Scikit-learn' but does not specify their version numbers, which is required for reproducibility. |
| Experiment Setup | Yes | For the regression linear probes implemented in scikit-learn, we performed a grid search over alpha values (controlling regulation strength) of [1.e 03, 1.e 02, 1.e 01, 1.e + 00, 1.e + 01]. Output predicted fitness values are scaled with Standard Scaler(). For the classification linear probes implemented in Py Torch, we use Adam optimizer and set learning rate to 1.e 4 with a decay rate of 0.1. Batch size of 256 over 120 epoches was used for annotation tasks and a batch size of 120 over 100 epoches for the secondary structure tasks. We implement early stopping on validation loss with a tolerance of 10 (after a minimum of 5 epochs). |