Self-Attention Attribution: Interpreting Information Interactions Inside Transformer

Authors: Yaru Hao, Li Dong, Furu Wei, Ke Xu12963-12971

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We take BERT as an example to conduct extensive studies. For example, on the MNLI dataset, adding one adversarial pattern into the premise can drop the accuracy of entailment from 82.87% to 0.8%.
Researcher Affiliation Collaboration 1 Beihang University 2 Microsoft Research {haoyaru@,kexu@nlsde.}buaa.edu.cn {lidong1,fuwei}@microsoft.com
Pseudocode Yes Algorithm 1 Attribution Tree Construction
Open Source Code No The paper does not contain any explicit statements about releasing source code for the proposed ATTATTR method, nor does it provide a link to a code repository.
Open Datasets Yes We perform BERT fine-tuning and conduct experiments on four classification datasets. MNLI (Williams, Nangia, and Bowman 2018)... RTE (Dagan, Glickman, and Magnini 2006; Bar-Haim et al. 2006; Giampiccolo et al. 2007; Bentivogli et al. 2009)... SST-2 (Socher et al. 2013)... MRPC (Dolan and Brockett 2005)...
Dataset Splits Yes We use the same data split as in (Wang et al. 2019). We calculate Ih on 200 examples sampled from the held-out dataset.
Hardware Specification Yes For a sequence of 128 tokens, the attribution time of the BERT-base model takes about one second on an Nvidia-v100 GPU card.
Software Dependencies No The paper mentions using 'BERT-base-cased' and fine-tuning settings suggested in 'Devlin et al. (2019)', but does not provide specific software version numbers for libraries or environments like Python, PyTorch, or TensorFlow.
Experiment Setup Yes When fine-tuning BERT, we follow the settings and the hyper-parameters suggested in (Devlin et al. 2019). In our experiments, we set m to 20, which performs well in practice. We set τ = 0.4 for layers l < 12. ... we set τ to 0 for the last layer.