reproducibilityindex.ai

Attention Meets Post-hoc Interpretability: A Mathematical Perspective

Authors: Gianluigi Lopardo, Frederic Precioso, Damien Garreau

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In this paper, we mathematically study a simple attention-based architecture and pinpoint the differences between post-hoc and attention-based explanations. We show that they provide quite different results, and that, despite their limitations, post-hoc methods are capable of capturing more useful insights than merely examining the attention weights. All our theoretical claims are supported by mathematical proofs and empirical validation, detailed in the Appendix. The code for the model and the experiments are available at https://github.com/gianluigilopardo/attention_meets_xai. We have conducted numerical experiments on a multi-head, multi-layer architecture. We trained a classifier with 6 layers and 6 heads on the IMDb dataset (refer to Appendix F), achieving an accuracy of 82.22%. Our interest lies in exploring the relationship between LIME and the attention weights.
Researcher Affiliation	Academia	Gianluigi Lopardo 1 2 Frederic Precioso 1 3 Damien Garreau 4 1Universit e Cˆote d Azur, Inria, CNRS 2LJAD 3I3S 4Julius Maximilians-Universit at W urzburg.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	Yes	The code for the model and the experiments are available at https://github.com/gianluigilopardo/attention_meets_xai.
Open Datasets	Yes	We trained the model on the IMDB dataset (Maas et al., 2011), which was preprocessed using standard tokenization and padding techniques.
Dataset Splits	Yes	The dataset was split into training, validation, and test sets with sizes of 20, 000, 5, 000, and 25, 000 samples, respectively.
Hardware Specification	Yes	Any of the experiments presented in this paper have been performed on a Py Torch implementation of the model presented in Section 2 and ran on one GPU Nvidia A100.
Software Dependencies	No	The paper mentions "Py Torch implementation" but does not specify version numbers for PyTorch or other software dependencies.
Experiment Setup	Yes	The model parameters were set as follows: Tmax 256, de 128, datt 64, dout 64. The model was trained for 10 epochs using a batch size of 16. We employed the Adam W optimizer with a learning rate of 0.0001 and used cross-entropy loss as the optimization objective.