BERTology Meets Biology: Interpreting Attention in Protein Language Models
Authors: Jesse Vig, Ali Madani, Lav R. Varshney, Caiming Xiong, richard socher, Nazneen Rajani
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate a set of methods for analyzing protein Transformer models through the lens of attention. We show that attention: (1) captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure, (2) targets binding sites, a key functional component of proteins, and (3) focuses on progressively more complex biophysical properties with increasing layer depth. We find this behavior to be consistent across three Transformer architectures (BERT, ALBERT, XLNet) and two distinct protein datasets. |
| Researcher Affiliation | Collaboration | Jesse Vig1 Ali Madani1 Lav R. Varshney1,2 Caiming Xiong1 Richard Socher1 Nazneen Fatema Rajani1 1Salesforce Research, 2University of Illinois at Urbana-Champaign {jvig,amadani,cxiong,rsocher,nazneen.rajani}@salesforce.com varshney@illinois.edu |
| Pseudocode | No | The paper does not contain any pseudocode or clearly labeled algorithm blocks. |
| Open Source Code | Yes | Code for visualization and analysis is available at https://github.com/salesforce/provis. |
| Open Datasets | Yes | For our analyses of amino acids and contact maps, we use a curated dataset from TAPE based on Protein Net (Al Quraishi, 2019; Fox et al., 2013; Berman et al., 2000; Moult et al., 2018)... For the analysis of secondary structure and binding sites we use the Secondary Structure dataset (Rao et al., 2019; Berman et al., 2000; Moult et al., 2018; Klausen et al., 2019) from TAPE. We obtained token-level binding site and protein modification labels from the Protein Data Bank (Berman et al., 2000). |
| Dataset Splits | Yes | For the diagnostic classifier, we used the respective training splits for training and the validation splits for evaluation. See Appendix B.4 for additional details. Table 2: Datasets used in analysis Dataset Train size Validation size Protein Net 25299 224 Secondary Structure 8678 2170 Binding Sites / PTM 5734 1418 |
| Hardware Specification | Yes | Experiments performed on single 16GB Tesla V-100 GPU. |
| Software Dependencies | No | The paper mentions several models and tools such as BERT, ALBERT, XLNet, TAPE, ProtTrans, and NGL Viewer, but it does not specify any version numbers for these software dependencies or underlying libraries (e.g., PyTorch, TensorFlow). |
| Experiment Setup | Yes | We set the attention threshold θ to 0.3 to select for high-confidence attention while retaining sufficient data for analysis. We truncate all protein sequences to a length of 512 to reduce memory requirements. |