Evolution-Inspired Loss Functions for Protein Representation Learning
Authors: Chengyue Gong, Adam Klivans, James Madigan Loy, Tianlong Chen, Qiang Liu, Daniel Jesus Diaz
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Across a variety of phenotypes and datasets, we demonstrate that Evo Rank leads to dramatic improvements in zero-shot performance and can compete with models fine-tuned on experimental data. |
| Researcher Affiliation | Collaboration | 1University of Texas at Austin 2Intelligent Proteins, LLC. |
| Pseudocode | No | The paper does not contain pseudocode or a clearly labeled algorithm block. |
| Open Source Code | No | The paper does not explicitly state that open-source code is provided for the methodology described, nor does it provide a direct link to a code repository. |
| Open Datasets | Yes | For the self-supervised training, we use the same procedure as Mut Compute X (d Oelsnitz et al., 2023). Briefly, this dataset consists of a 90:10 split of 2,569,256 microenvironments sampled from 22,759 protein sequences clustered at 50% sequence similarity and having a structure resolution of at least 3 A from the RCSB (November 2021). Our test data for the folding free energy changes and binding free energy changes are proposed in Diaz et al. (2023); Gong et al. (2023) |
| Dataset Splits | No | The paper mentions a |
| Hardware Specification | Yes | Training the model typically requires approximately two GPU days on one A100. |
| Software Dependencies | No | The paper mentions |
| Experiment Setup | Yes | Self-supervised training was done with the Adam W optimizer, 512 batch size, 5 10 5 learning rate, 10 5 weight decay. We first train using the soft-label loss in equation (2) for 100K iterations, and then refine with the Evo Rank loss defined in equation (4), for an additional 100K iterations. |