reproducibilityindex.ai

Elliptical Attention

Authors: Stefan Nielsen, Laziz Abdullaev, Rachel S.Y. Teo, Tan Nguyen

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate the advantages of Elliptical Attention over the baseline dot-product attention and state-of-the-art attention methods on various practical tasks, including object classification, image segmentation, and language modeling across different data modalities.
Researcher Affiliation	Collaboration	Stefan K. Nielsen FPT Software AI Center Ha Noi, Vietnam stefannvkp@fpt.com Laziz U. Abdullaev Department of Mathematics National University of Singapore Singapore 119077, Singapore laziz.abdullaev@u.nus.edu Rachel S.Y. Teo Department of Mathematics National University of Singapore Singapore 119077, Singapore rachel.teo@u.nus.edu Tan M. Nguyen Department of Mathematics National University of Singapore Singapore 119077, Singapore tanmn@nus.edu.sg
Pseudocode	Yes	Pseudocode for the Elliptical Attention computation is provided in Appendix F.12.
Open Source Code	Yes	The code is publicly available at https://github.com/stefvk/Elliptical-Attention.
Open Datasets	Yes	We pretrain and evaluate our models on the Wiki Text-103 benchmark in comparison with the standard baseline Transformer [82], Performer [9], Transformer-MGK [52], Fourier Former [54], and the robust kernel density estimationbased Transformers including Transformer-SPKDE and Transformer-Mo M [23].
Dataset Splits	Yes	The validation set and test sets consist of 60 articles with 218K and 246K tokens respectively.
Hardware Specification	Yes	All models are trained and evaluated on two NVIDIA A100 SXM4 40GB GPUs.
Software Dependencies	No	The paper mentions 'default Py Torch settings' but does not specify version numbers for PyTorch or any other software libraries or dependencies.
Experiment Setup	Yes	We trained with Adam using a starting learning rate of 0.00025 and cosine scheduling under default Py Torch settings. We used a batch size of 96 and trained for 120 epochs and 2000 warmup steps. The train and evaluation target lengths were set to 256.