Recasting Self-Attention with Holographic Reduced Representations
Authors: Mohammad Mahmudul Alam, Edward Raff, Stella Biderman, Tim Oates, James Holt
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We test our method in two settings: using the Long Range Arena (LRA) to compare with prior approaches and a real-world task in malware detection. These results show several benefits to the Hrrformer: it is near state-of-the-art in terms of accuracy... and Experiments are performed to validate the effectiveness of the method in terms of time and space complexity in known benchmarks. |
| Researcher Affiliation | Collaboration | 1Department of Computer Science and Electrical Engineering, University of Maryland, Baltimore County, Baltimore, MD, USA 2Laboratory for Physical Sciences, College Park, MD, USA 3Booz Allen Hamilton, Mc Lean, VA, USA 4Eleuther AI. Correspondence to: Edward Raff <Raff Edward@bah.com>, Tim Oates <oates@cs.umbc.edu>. |
| Pseudocode | Yes | The code of the Hrrformer self-attention model is written in JAX. Below is a code snippet of the Multi-headed Hrrformer attention. and Figure 7: Multi-headed Hrrformer Self-attention. |
| Open Source Code | Yes | Code is available at https: //github.com/Neuromorphic Computa tion Research Program/Hrrformer |
| Open Datasets | Yes | Our first result is running many of the current popular and state-of-the-art (SOTA) xformers on the real-world classification task of the Ember malware detection dataset (Anderson & Roth, 2018). and Our second result will use the Long Range Arena (LRA) (Tay et al., 2020c) which has become a standard for evaluations in this space. |
| Dataset Splits | Yes | EMBER is a benchmark dataset for the malware classification task (Anderson & Roth, 2018). The benchmark contains 600K labeled training samples (300K malicious, 300K benign) and 200K labeled test samples (100K malicious, 100K benign). and The Long Range Arena (LRA) (Tay et al., 2020c) benchmark comprises 6 diverse tasks covering image, text, math, language, and spatial modeling under long context scenarios ranging from 1K to 16K. |
| Hardware Specification | Yes | Each of the models is trained for a total of 10 epochs in 16 NVIDIA TESLA PH402 32GB GPUs. and To measure these results, a single NVIDIA TESLA PH402 32GB GPU is utilized with a fixed batch size of 4 and a maximum sequence length of 4000 with an embedding size of 32 and feature size of 64. |
| Software Dependencies | No | The code of the Hrrformer self-attention model is written in JAX. No version numbers for JAX or other dependencies are provided. |
| Experiment Setup | Yes | The dropout rate is chosen to be 0.1, the learning rate is 10 3 with an exponential decay rate of 0.85. Each of the models is trained for a total of 10 epochs in 16 NVIDIA TESLA PH402 32GB GPUs. and Table 3: List of the hyperparameters used in the Long Range Arena (LRA) benchmark and EMBER malware classification task. |