Long Range Arena : A Benchmark for Efficient Transformers
Authors: Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3 EXPERIMENTAL RESULTS. Table 1: Experimental results on Long-Range Arena benchmark. |
| Researcher Affiliation | Industry | 1Google Research 2Google Deep Mind {yitay, dehghani}@google.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | Our framework, which we plan to open source, is written in JAX/FLAX. |
| Open Datasets | Yes | We use the IMDb reviews (Maas et al., 2011) dataset, which is a commonly used dataset to benchmark document classification. We use the ACL Anthology Network (AAN; Radev et al., 2013) dataset. In LRA, we use the CIFAR-10 dataset (Krizhevsky, 2009) for the image classification task. |
| Dataset Splits | Yes | averaged over 1K random samples from the validation set. |
| Hardware Specification | Yes | Benchmarks are run on 4x4 TPU V3 Chips. We conduct experiments on 4x4 TPU V3 Chips. |
| Software Dependencies | No | Our framework, which we plan to open source, is written in JAX/FLAX1. We implement our benchmark (which includes the task, evaluators, and models) in Python 3 and Jax/Flax. No specific version numbers for JAX/FLAX or other libraries are provided. |
| Experiment Setup | Yes | All our xformer models have an embedding dimension of 512, 8 heads, 6 layers and a feed-forward dimensions of 2048. We train all models for 5K steps. All xformer models are parameterized by the same number of layers, heads and hidden dimensions, namely 8 heads, 512 hidden dimensions and d = 2048 for positional FFN layers. We use 6 layers for all xformers. The learning rate is 0.05 with weight decay of 0.1. We use Adam with warmup. All models are trained for 20K steps and a batch size of 32. |