ATMAN: Understanding Transformer Predictions Through Memory Efficient Attention Manipulation
Authors: Björn Deiseroth, Mayukh Deb, Samuel Weinbach, Manuel Brack, Patrick Schramowski, Kristian Kersting
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our exhaustive experiments on text and image-text benchmarks demonstrate that ATMAN outperforms current state-of-the-art gradient-based methods on several metrics and models while being computationally efficient. |
| Researcher Affiliation | Collaboration | Björn Deiseroth1,2,3 Mayukh Deb1 Samuel Weinbach1 Manuel Brack2,4 Patrick Schramowski2,3,4,5 Kristian Kersting2,3,4 1Aleph Alpha 2Technical University Darmstadt 3Hessian Center for Artificial Intelligence (hessian.AI) 4German Center for Artificial Intelligence (DFKI) 5LAION |
| Pseudocode | No | The paper describes the proposed method through text and diagrams (e.g., Fig. 2a), but it does not include a formal pseudocode or algorithm block. |
| Open Source Code | Yes | Source code: https://github.com/Aleph-Alpha/At Man |
| Open Datasets | Yes | For evaluation, we used the Stanford Question Answering (QA) Dataset (SQu AD) [23]. |
| Dataset Splits | No | The paper describes sampling strategies for its evaluation (e.g., 'randomly sample 200 images per class on the filtered set') but does not specify explicit train/validation/test dataset splits with percentages, counts, or predefined splits. |
| Hardware Specification | Yes | Fig. 5 illustrates the runtime and memory consumption on a single NVIDIA A100 80GB GPU. |
| Software Dependencies | No | The paper mentions using 'Captum' as a library for integrating some methods but does not provide specific version numbers for Captum or other software dependencies. |
| Experiment Setup | Yes | We fixed the parameter κ = 0.7 of Eq. 6 and f = 0.9 of Eq. 4 throughout this work. They were empirically concluded by running a line sweep on a randomly sampled subset of the Open Images dataset once, c.f. Fig. 11. |