Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
General-purpose, long-context autoregressive modeling with Perceiver AR
Authors: Curtis Hawthorne, Andrew Jaegle, Cฤtฤlina Cangea, Sebastian Borgeaud, Charlie Nash, Mateusz Malinowski, Sander Dieleman, Oriol Vinyals, Matthew Botvinick, Ian Simon, Hannah Sheahan, Neil Zeghidour, Jean-Baptiste Alayrac, Joao Carreira, Jesse Engel
ICML 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that this architecture produces excellent results on several real-world domains with long-range context: RGB-level images (Section 5.2), tokenized language (Sections 5.3 to 5.5), and audio or symbolic music (Section 5.6). We demonstrate that Perceiver AR can learn to perfectly recognize long-context patterns over distances of at least 100k tokens on a synthetic copy task with known ground-truth structure (Section 5.1.1). |
| Researcher Affiliation | Industry | 1Google Research, Brain Team 2Deep Mind. Correspondence to: Curtis Hawthorne <EMAIL>, Andrew Jaegle <EMAIL>. |
| Pseudocode | No | See Appendix C for in-depth mathematical description of Perceivers and the Perceiver AR architecture and Appendix E for additional technical details. |
| Open Source Code | Yes | Model code is available at https://github.com/ google-research/perceiver-ar. |
| Open Datasets | Yes | To test this architecture s capabilities in the image modality, we use the downsampled Image Net dataset (van den Oord et al., 2016b) at the 64 64 resolution. |
| Dataset Splits | Yes | After 750k steps, we achieve 3.40 bits/dim on the validation set, exceeding the performance of previous autoregressive models (Table 3). |
| Hardware Specification | Yes | Training and evaluation were done on either TPUv2 or TPUv3 clusters. |
| Software Dependencies | No | We use the Adam optimizer (Kingma & Ba, 2015) as implemented in the Optax framework (Hessel et al., 2020) with b1 = 0.1, b2 = 0.999, eps = 1e 8, a base learning rate of 3e 4, and a 10k step linear warmup. |
| Experiment Setup | Yes | We use the Adam optimizer (Kingma & Ba, 2015) as implemented in the Optax framework (Hessel et al., 2020) with b1 = 0.1, b2 = 0.999, eps = 1e 8, a base learning rate of 3e 4, and a 10k step linear warmup. |