Causal Interpretation of Self-Attention in Pre-Trained Transformers
Authors: Raanan Y. Rohekar, Yaniv Gurwicz, Shami Nisimov
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6 Empirical Evaluation. In this section we demonstrate how a causal graph constructed from a self-attention matrix in a Transformer based model can be used to explain which specific symbols in an input sequence are the causes of the Transformer output. We experiment on the tasks of sentiment classification, which classifies an input sequence, and recommendation systems, which generates a candidate list of recommended symbols (top-k) for the next item. |
| Researcher Affiliation | Industry | Raanan Y. Rohekar Intel Labs raanan.yehezkel@intel.com; Yaniv Gurwicz Intel Labs yaniv.gurwicz@intel.com; Shami Nisimov Intel Labs shami.nisimov@intel.com |
| Pseudocode | Yes | Algorithm 1: CLEANN: Causa L Explanations from Attention in Neural Networks |
| Open Source Code | Yes | Implementation tools are in https://github.com/Intel Labs/causality-lab. |
| Open Datasets | Yes | We experiment on the tasks of sentiment classification of movie reviews from IMDB (20) using a pre-trained BERT model (9) that was fine-tuned for the task. [...] For empirical evaluation, we use the BERT4Rec recommender (36), pre-trained on the Movie Lens 1M dataset (15)... |
| Dataset Splits | No | The paper states it uses IMDB and Movie Lens 1M datasets but does not provide specific details on how these datasets were split into training, validation, and test sets, or reference standard splits with specific percentages/counts. |
| Hardware Specification | No | The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using a 'pre-trained BERT model' and 'BERT4Rec recommender' but does not specify version numbers for these or any other software dependencies, such as programming languages or libraries. |
| Experiment Setup | No | The paper mentions fine-tuning and pre-training models (BERT, BERT4Rec) but does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, epochs), optimizer settings, or model initialization details. |