Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Long Range Arena : A Benchmark for Efficient Transformers
Authors: Yi Tay, Mostafa Dehghani, Samira Abnar, Yikang Shen, Dara Bahri, Philip Pham, Jinfeng Rao, Liu Yang, Sebastian Ruder, Donald Metzler
ICLR 2021 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 3 EXPERIMENTAL RESULTS. Table 1: Experimental results on Long-Range Arena benchmark. |
| Researcher Affiliation | Industry | 1Google Research 2Google Deep Mind EMAIL |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | Our framework, which we plan to open source, is written in JAX/FLAX. |
| Open Datasets | Yes | We use the IMDb reviews (Maas et al., 2011) dataset, which is a commonly used dataset to benchmark document classification. We use the ACL Anthology Network (AAN; Radev et al., 2013) dataset. In LRA, we use the CIFAR-10 dataset (Krizhevsky, 2009) for the image classification task. |
| Dataset Splits | Yes | averaged over 1K random samples from the validation set. |
| Hardware Specification | Yes | Benchmarks are run on 4x4 TPU V3 Chips. We conduct experiments on 4x4 TPU V3 Chips. |
| Software Dependencies | No | Our framework, which we plan to open source, is written in JAX/FLAX1. We implement our benchmark (which includes the task, evaluators, and models) in Python 3 and Jax/Flax. No specific version numbers for JAX/FLAX or other libraries are provided. |
| Experiment Setup | Yes | All our xformer models have an embedding dimension of 512, 8 heads, 6 layers and a feed-forward dimensions of 2048. We train all models for 5K steps. All xformer models are parameterized by the same number of layers, heads and hidden dimensions, namely 8 heads, 512 hidden dimensions and d = 2048 for positional FFN layers. We use 6 layers for all xformers. The learning rate is 0.05 with weight decay of 0.1. We use Adam with warmup. All models are trained for 20K steps and a batch size of 32. |