Multi Resolution Analysis (MRA) for Approximate Self-Attention

Authors: Zhanpeng Zeng, Sourav Pal, Jeffery Kline, Glenn M Fung, Vikas Singh

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We undertake an extensive set of experiments and demonstrate that this multi-resolution scheme outperforms most efficient self-attention proposals and is favorable for both short and long sequences. We perform a broad set of experiments to evaluate the practical performance profile of our MRA-based self-attention module.
Researcher Affiliation Collaboration 1University of Wisconsin, Madison, USA 2American Family Insurance, Madison, USA.
Pseudocode Yes Algorithm 1 Constructing the set J and Algorithm 2 Computing ˆAV
Open Source Code Yes Code is available at https://github.com/ mlpen/mra-attention.
Open Datasets Yes RoBERTa language modeling (Liu et al., 2019), English Wikipedia and Bookcorpus (Zhu et al., 2015), Wiki Hop (Welbl et al., 2018), Stories dataset (Trinh & Le, 2018), Real News dataset (Zellers et al., 2019), Long Range Arena (LRA) (Tay et al., 2021), Image Net (Russakovsky et al., 2015), CIFAR-10 (Krizhevsky et al., 2009).
Dataset Splits No The paper mentions using a validation set for evaluation (e.g., 'validation Masked Language Modeling (MLM) accuracy') but does not provide specific details on the dataset splits (e.g., percentages or sample counts for train/validation/test sets).
Hardware Specification Yes The efficiency is measured on a single Nvidia RTX 3090.
Software Dependencies No The paper mentions using Hugging Face and other implementations from official GitHub repos but does not provide specific version numbers for any software dependencies.
Experiment Setup Yes Table 8. Hyperparameters for all experiments. In this section, we provide more details about the experiments. We run all experiments on a 8 Nvidia RTX 3090 server. The hyperparameters of each experiments are summarized in Tab. 8.