reproducibilityindex.ai

Causal Interpretation of Self-Attention in Pre-Trained Transformers

Authors: Raanan Y. Rohekar, Yaniv Gurwicz, Shami Nisimov

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	6 Empirical Evaluation. In this section we demonstrate how a causal graph constructed from a self-attention matrix in a Transformer based model can be used to explain which specific symbols in an input sequence are the causes of the Transformer output. We experiment on the tasks of sentiment classification, which classifies an input sequence, and recommendation systems, which generates a candidate list of recommended symbols (top-k) for the next item.
Researcher Affiliation	Industry	Raanan Y. Rohekar Intel Labs raanan.yehezkel@intel.com; Yaniv Gurwicz Intel Labs yaniv.gurwicz@intel.com; Shami Nisimov Intel Labs shami.nisimov@intel.com
Pseudocode	Yes	Algorithm 1: CLEANN: Causa L Explanations from Attention in Neural Networks
Open Source Code	Yes	Implementation tools are in https://github.com/Intel Labs/causality-lab.
Open Datasets	Yes	We experiment on the tasks of sentiment classification of movie reviews from IMDB (20) using a pre-trained BERT model (9) that was fine-tuned for the task. [...] For empirical evaluation, we use the BERT4Rec recommender (36), pre-trained on the Movie Lens 1M dataset (15)...
Dataset Splits	No	The paper states it uses IMDB and Movie Lens 1M datasets but does not provide specific details on how these datasets were split into training, validation, and test sets, or reference standard splits with specific percentages/counts.
Hardware Specification	No	The paper does not provide any specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	No	The paper mentions using a 'pre-trained BERT model' and 'BERT4Rec recommender' but does not specify version numbers for these or any other software dependencies, such as programming languages or libraries.
Experiment Setup	No	The paper mentions fine-tuning and pre-training models (BERT, BERT4Rec) but does not provide specific experimental setup details such as hyperparameter values (e.g., learning rate, batch size, epochs), optimizer settings, or model initialization details.