Evolution-Inspired Loss Functions for Protein Representation Learning

Authors: Chengyue Gong, Adam Klivans, James Madigan Loy, Tianlong Chen, Qiang Liu, Daniel Jesus Diaz

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Across a variety of phenotypes and datasets, we demonstrate that Evo Rank leads to dramatic improvements in zero-shot performance and can compete with models fine-tuned on experimental data.
Researcher Affiliation Collaboration 1University of Texas at Austin 2Intelligent Proteins, LLC.
Pseudocode No The paper does not contain pseudocode or a clearly labeled algorithm block.
Open Source Code No The paper does not explicitly state that open-source code is provided for the methodology described, nor does it provide a direct link to a code repository.
Open Datasets Yes For the self-supervised training, we use the same procedure as Mut Compute X (d Oelsnitz et al., 2023). Briefly, this dataset consists of a 90:10 split of 2,569,256 microenvironments sampled from 22,759 protein sequences clustered at 50% sequence similarity and having a structure resolution of at least 3 A from the RCSB (November 2021). Our test data for the folding free energy changes and binding free energy changes are proposed in Diaz et al. (2023); Gong et al. (2023)
Dataset Splits No The paper mentions a
Hardware Specification Yes Training the model typically requires approximately two GPU days on one A100.
Software Dependencies No The paper mentions
Experiment Setup Yes Self-supervised training was done with the Adam W optimizer, 512 batch size, 5 10 5 learning rate, 10 5 weight decay. We first train using the soft-label loss in equation (2) for 100K iterations, and then refine with the Evo Rank loss defined in equation (4), for an additional 100K iterations.