reproducibilityindex.ai

MSA Transformer

Authors: Roshan M Rao, Jason Liu, Robert Verkuil, Joshua Meier, John Canny, Pieter Abbeel, Tom Sercu, Alexander Rives

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We train an MSA Transformer model with 100M parameters on a large dataset (4.3 TB) of 26 million MSAs... The resulting model surpasses current state-of-the-art unsupervised structure learning methods by a wide margin... We study the MSA Transformer in a panel of structure prediction tasks, evaluating unsupervised contact prediction from the attentions of the model, and performance of features in supervised contact and secondary structure prediction pipelines.
Researcher Affiliation	Collaboration	1UC Berkeley 2Work performed during internship at FAIR. 3Facebook AI Research 4New York University.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code and weights available at https://github.com/facebookresearch/esm.
Open Datasets	Yes	Models are trained on a dataset of 26 million MSAs. An MSA is generated for each Uni Ref50 (Suzek et al., 2007) sequence by searching Uni Clust30 (Mirdita et al., 2017) with HHblits (Steinegger et al., 2019).
Dataset Splits	Yes	We use the same validation methodology. A logistic regression with 144 parameters is ﬁt on 20 training structures from the tr Rosetta dataset (Yang et al., 2019). This is then used to predict the probability of protein contacts on another 14842 structures from the tr Rosetta dataset (training structures are excluded). The models are trained on the Netsurf training dataset.
Hardware Specification	Yes	All models are trained on 32 V100 GPUs for 100k updates.
Software Dependencies	No	The paper mentions software like HHblits, but does not provide specific version numbers for any software dependencies required to replicate the experiments.
Experiment Setup	Yes	We train 100M parameters model with 12 layers, 768 embedding size, and 12 attention heads, using a batch size of 512 MSAs, learning rate 10 4, no weight decay, and an inverse square root learning rate schedule with 16000 warmup steps.