MSA Transformer
Authors: Roshan M Rao, Jason Liu, Robert Verkuil, Joshua Meier, John Canny, Pieter Abbeel, Tom Sercu, Alexander Rives
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train an MSA Transformer model with 100M parameters on a large dataset (4.3 TB) of 26 million MSAs... The resulting model surpasses current state-of-the-art unsupervised structure learning methods by a wide margin... We study the MSA Transformer in a panel of structure prediction tasks, evaluating unsupervised contact prediction from the attentions of the model, and performance of features in supervised contact and secondary structure prediction pipelines. |
| Researcher Affiliation | Collaboration | 1UC Berkeley 2Work performed during internship at FAIR. 3Facebook AI Research 4New York University. |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code and weights available at https://github.com/facebookresearch/esm. |
| Open Datasets | Yes | Models are trained on a dataset of 26 million MSAs. An MSA is generated for each Uni Ref50 (Suzek et al., 2007) sequence by searching Uni Clust30 (Mirdita et al., 2017) with HHblits (Steinegger et al., 2019). |
| Dataset Splits | Yes | We use the same validation methodology. A logistic regression with 144 parameters is fit on 20 training structures from the tr Rosetta dataset (Yang et al., 2019). This is then used to predict the probability of protein contacts on another 14842 structures from the tr Rosetta dataset (training structures are excluded). The models are trained on the Netsurf training dataset. |
| Hardware Specification | Yes | All models are trained on 32 V100 GPUs for 100k updates. |
| Software Dependencies | No | The paper mentions software like HHblits, but does not provide specific version numbers for any software dependencies required to replicate the experiments. |
| Experiment Setup | Yes | We train 100M parameters model with 12 layers, 768 embedding size, and 12 attention heads, using a batch size of 512 MSAs, learning rate 10 4, no weight decay, and an inverse square root learning rate schedule with 16000 warmup steps. |