reproducibilityindex.ai

Relative Positional Encoding for Transformers with Linear Complexity

Authors: Antoine Liutkus, Ondřej Cı́fka, Shih-Lun Wu, Umut Simsekli, Yi-Hsuan Yang, Gael Richard

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We illustrate the performance of our approach on the Long-Range Arena benchmark and on music generation. We study the impact of SPE on performance on the Long Range Arena benchmark (Tay et al., 2021) and two music generation tasks. Our results demonstrate better validation losses and extrapolation ability. We evaluate the proposed method in the Long-Range Arena (LRA; Tay et al., 2021), a benchmark for efﬁcient Transformers, consisting of sequence classiﬁcation tasks with a focus on long-range dependencies. The results of the benchmark are given in Table 1.
Researcher Affiliation	Collaboration	1Inria, Zenith Team, UMR LIRMM, Univ. Montpellier, France 2LTCI, T el ecom Paris, Institut Polytechnique de Paris, France 3Research Center for IT Innovation, Academia Sinica, Taiwan 4National Taiwan University, Taiwan 5Taiwan AI Labs, Taiwan 6INRIA D epartement d Informatique de l Ecole Normale Sup erieure PSL Research University, Paris, France.
Pseudocode	Yes	Algorithm 1 Stochastic Positional Encoding. Input position kernel P(m, n), number of replicas R. initial M D and N D queries Q and keys K.
Open Source Code	Yes	We provide additional resources on our companion website,2 including Python implementations of SPE for Py Torch and JAX/Flax. 2https://cifkao.github.io/spe/
Open Datasets	Yes	We evaluate the proposed method in the Long-Range Arena (LRA; Tay et al., 2021)... We use the following tasks from this benchmark: List Ops... Text: movie review sentiment analysis on the IMDB corpus (Maas et al., 2011); Retrieval: article similarity classiﬁcation on the All About NLP (AAN) corpus (Radev et al., 2013); Image: object recognition on the CIFAR10 dataset (Krizhevsky, 2009)... We train Performers for music generation... on a dataset composed of 1 747 pop piano tracks, encoded using the recently proposed Revamped MIDI-derived format (REMI; Huang & Yang, 2020).
Dataset Splits	Yes	We hold out 5% of the songs as the validation set. We adopt the conﬁguration of Tay et al., only changing the PE and the batch sizes/learning rates to allow training on limited hardware with similar results. All other hyperparameters are kept identical to the original LRA. We display validation cross-entropy computed with teacher forcing (Williams & Zipser, 1989) in Figure 3, as a function of the target token position.
Hardware Specification	No	The paper mentions training "on limited hardware" but does not specify any particular GPU models, CPU types, or other hardware components used for the experiments.
Software Dependencies	No	The paper mentions "Python implementations of SPE for Py Torch and JAX/Flax" but does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup	Yes	We adopt the conﬁguration of Tay et al., only changing the PE and the batch sizes/learning rates to allow training on limited hardware with similar results. All other hyperparameters are kept identical to the original LRA. We train Performers for music generation, with 24 layers and 8 heads per layer on a dataset... The sequences are composed of metrical tokens: bar, subbeat, and tempo... We train the models with sequence length N = 2 048... The models (24-layer Performers with 8 attention heads) are trained on an accompaniment dataset... training sequences of length N = 512...