Multi Resolution Analysis (MRA) for Approximate Self-Attention
Authors: Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh
ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We undertake an extensive set of experiments and demonstrate that this multi-resolution scheme outperforms most efficient self-attention proposals and is favorable for both short and long sequences. We perform a broad set of experiments to evaluate the practical performance profile of our MRA-based self-attention module. |
| Researcher Affiliation | Collaboration | 1University of Wisconsin, Madison, USA 2American Family Insurance, Madison, USA. |
| Pseudocode | Yes | Algorithm 1 Constructing the set J and Algorithm 2 Computing ˆAV |
| Open Source Code | Yes | Code is available at https://github.com/ mlpen/mra-attention. |
| Open Datasets | Yes | RoBERTa language modeling (Liu et al., 2019), English Wikipedia and Bookcorpus (Zhu et al., 2015), Wiki Hop (Welbl et al., 2018), Stories dataset (Trinh & Le, 2018), Real News dataset (Zellers et al., 2019), Long Range Arena (LRA) (Tay et al., 2021), Image Net (Russakovsky et al., 2015), CIFAR-10 (Krizhevsky et al., 2009). |
| Dataset Splits | No | The paper mentions using a validation set for evaluation (e.g., 'validation Masked Language Modeling (MLM) accuracy') but does not provide specific details on the dataset splits (e.g., percentages or sample counts for train/validation/test sets). |
| Hardware Specification | Yes | The efficiency is measured on a single Nvidia RTX 3090. |
| Software Dependencies | No | The paper mentions using Hugging Face and other implementations from official GitHub repos but does not provide specific version numbers for any software dependencies. |
| Experiment Setup | Yes | Table 8. Hyperparameters for all experiments. In this section, we provide more details about the experiments. We run all experiments on a 8 Nvidia RTX 3090 server. The hyperparameters of each experiments are summarized in Tab. 8. |