reproducibilityindex.ai

BERTology Meets Biology: Interpreting Attention in Protein Language Models

Authors: Jesse Vig, Ali Madani, Lav R. Varshney, Caiming Xiong, richard socher, Nazneen Rajani

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate a set of methods for analyzing protein Transformer models through the lens of attention. We show that attention: (1) captures the folding structure of proteins, connecting amino acids that are far apart in the underlying sequence, but spatially close in the three-dimensional structure, (2) targets binding sites, a key functional component of proteins, and (3) focuses on progressively more complex biophysical properties with increasing layer depth. We ﬁnd this behavior to be consistent across three Transformer architectures (BERT, ALBERT, XLNet) and two distinct protein datasets.
Researcher Affiliation	Collaboration	Jesse Vig1 Ali Madani1 Lav R. Varshney1,2 Caiming Xiong1 Richard Socher1 Nazneen Fatema Rajani1 1Salesforce Research, 2University of Illinois at Urbana-Champaign {jvig,amadani,cxiong,rsocher,nazneen.rajani}@salesforce.com varshney@illinois.edu
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	Yes	Code for visualization and analysis is available at https://github.com/salesforce/provis.
Open Datasets	Yes	For our analyses of amino acids and contact maps, we use a curated dataset from TAPE based on Protein Net (Al Quraishi, 2019; Fox et al., 2013; Berman et al., 2000; Moult et al., 2018)... For the analysis of secondary structure and binding sites we use the Secondary Structure dataset (Rao et al., 2019; Berman et al., 2000; Moult et al., 2018; Klausen et al., 2019) from TAPE. We obtained token-level binding site and protein modiﬁcation labels from the Protein Data Bank (Berman et al., 2000).
Dataset Splits	Yes	For the diagnostic classiﬁer, we used the respective training splits for training and the validation splits for evaluation. See Appendix B.4 for additional details. Table 2: Datasets used in analysis Dataset Train size Validation size Protein Net 25299 224 Secondary Structure 8678 2170 Binding Sites / PTM 5734 1418
Hardware Specification	Yes	Experiments performed on single 16GB Tesla V-100 GPU.
Software Dependencies	No	The paper mentions several models and tools such as BERT, ALBERT, XLNet, TAPE, ProtTrans, and NGL Viewer, but it does not specify any version numbers for these software dependencies or underlying libraries (e.g., PyTorch, TensorFlow).
Experiment Setup	Yes	We set the attention threshold θ to 0.3 to select for high-conﬁdence attention while retaining sufﬁcient data for analysis. We truncate all protein sequences to a length of 512 to reduce memory requirements.