MSA Transformer

Authors: Roshan M Rao, Jason Liu, Robert Verkuil, Joshua Meier, John Canny, Pieter Abbeel, Tom Sercu, Alexander Rives

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We train an MSA Transformer model with 100M parameters on a large dataset (4.3 TB) of 26 million MSAs... The resulting model surpasses current state-of-the-art unsupervised structure learning methods by a wide margin... We study the MSA Transformer in a panel of structure prediction tasks, evaluating unsupervised contact prediction from the attentions of the model, and performance of features in supervised contact and secondary structure prediction pipelines.
Researcher Affiliation Collaboration 1UC Berkeley 2Work performed during internship at FAIR. 3Facebook AI Research 4New York University.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes Code and weights available at https://github.com/facebookresearch/esm.
Open Datasets Yes Models are trained on a dataset of 26 million MSAs. An MSA is generated for each Uni Ref50 (Suzek et al., 2007) sequence by searching Uni Clust30 (Mirdita et al., 2017) with HHblits (Steinegger et al., 2019).
Dataset Splits Yes We use the same validation methodology. A logistic regression with 144 parameters is fit on 20 training structures from the tr Rosetta dataset (Yang et al., 2019). This is then used to predict the probability of protein contacts on another 14842 structures from the tr Rosetta dataset (training structures are excluded). The models are trained on the Netsurf training dataset.
Hardware Specification Yes All models are trained on 32 V100 GPUs for 100k updates.
Software Dependencies No The paper mentions software like HHblits, but does not provide specific version numbers for any software dependencies required to replicate the experiments.
Experiment Setup Yes We train 100M parameters model with 12 layers, 768 embedding size, and 12 attention heads, using a batch size of 512 MSAs, learning rate 10 4, no weight decay, and an inverse square root learning rate schedule with 16000 warmup steps.