reproducibilityindex.ai

PoET: A generative model of protein families as sequences-of-sequences

Authors: Timothy Truong Jr, Tristan Bepler

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In extensive experiments on deep mutational scanning datasets, we show that Po ET outperforms existing protein language models and evolutionary sequence models for variant function prediction across proteins of all MSA depths.
Researcher Affiliation	Industry	Timothy F. Truong Jr Open Protein.AI NY, USA ttruong@openprotein.ai Tristan Bepler Open Protein.AI NY, USA tbepler@openprotein.ai
Pseudocode	Yes	Algorithm 1 Tiered Transformer Decoder Layer
Open Source Code	Yes	Code and pre-trained model weights are available at https://github.com/Open Protein AI/Po ET.
Open Datasets	Yes	Models were trained on 29 million sets of homologous sequences. Each set corresponds to a sequence in Uni Ref50 Version 2103, and contains all its homologs in Uni Ref50 found using Diamond [34]. We evalaute Po ET on Protein Gym [11], the largest collection of such data yet, containing 87 datasets with substitution variants and 7 datasets with indel variants.
Dataset Splits	Yes	We use the same validation set as Notin et al. [11] for tuning hyperparameters.
Hardware Specification	Yes	We trained 57M parameter versions of Po ET for up to 3 days on 7 x A100 GPUs with three context lengths: 4K, 8K, and 16K.
Software Dependencies	No	The paper mentions several software tools like Diamond, Jack HMMer, MMseqs2, Colab Fold, Alpha Fold2, and MAFFT, but does not provide specific version numbers for these software dependencies, nor for other libraries or programming languages used.
Experiment Setup	Yes	We trained 57M parameter versions of Po ET for up to 3 days on 7 x A100 GPUs with three context lengths: 4K, 8K, and 16K. We used the Ada Factor optimizer [40] with initial learning rate 1e-2, square root learning rate decay, and otherwise default parameters. Hyperparameters for Po ET variations used in ablation experiments ( 5.2.1) are summarized in Table 3.