reproducibilityindex.ai

Perceiver IO: A General Architecture for Structured Inputs & Outputs

Authors: Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier J Henaff, Matthew Botvinick, Andrew Zisserman, Oriol Vinyals, Joao Carreira

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	The same architecture achieves strong results on tasks spanning natural language and visual understanding, multi-task and multi-modal reasoning, and Star Craft II. ... To probe the generality of Perceiver IO, we evaluate it on several domains including language understanding (Wikipedia+C4 masked language modeling), visual understanding (Sintel/KITTI optical ﬂow and Image Net classiﬁcation), multi-modal (Kinetics autoencoding and Audio Set classiﬁcation) & multi-task settings (multi-task GLUE), and symbolic representations for games (Star Craft II).
Researcher Affiliation	Industry	Andrew Jaegle, Sebastian Borgeaud, Jean-Baptiste Alayrac, Carl Doersch, Catalin Ionescu, David Ding, Skanda Koppula, Daniel Zoran, Andrew Brock, Evan Shelhamer, Olivier Hénaff, Matthew M. Botvinick, Andrew Zisserman, Oriol Vinyals, João Carreira ... All experiments were conducted using JAX (Bradbury et al., 2018) and the Deep Mind JAX ecosystem (Babuschkin et al., 2020).
Pseudocode	No	No pseudocode or clearly labeled algorithm blocks were found.
Open Source Code	No	The paper does not provide an explicit statement or link for the open-sourcing of the Perceiver IO code. It mentions using JAX (Bradbury et al., 2018) and the Deep Mind JAX ecosystem (Babuschkin et al., 2020), which are tools used, not the code for their specific method.
Open Datasets	Yes	language understanding (Wikipedia+C4 masked language modeling), visual understanding (Sintel/KITTI optical ﬂow and Image Net classiﬁcation), multi-modal (Kinetics autoencoding and Audio Set classiﬁcation) & multi-task settings (multi-task GLUE), and symbolic representations for games (Star Craft II). ... We pretrain on the Masked Language Modeling (MLM) task proposed in Devlin et al. (2019) using a large text corpus obtained by combining English Wikipedia and C4 (Raffel et al., 2020).
Dataset Splits	Yes	We ﬁnetune Perceiver IO on the GLUE Benchmark Wang et al. (2019), reporting the best performance on the dev set for a ﬁxed size sweep of ﬁnetuning hyperparameters.
Hardware Specification	Yes	All experiments were conducted using JAX (Bradbury et al., 2018) and the Deep Mind JAX ecosystem (Babuschkin et al., 2020). ... We use a batch size of 1024 and 64 TPUs. ... Our most expensive model achieves approximately 0.8 frames/sec on a 2017 TITAN Xp, and our lightweight model (with conv downsampling and RAFT-style upsampling) achieves 3.3 frames/sec... On the publicly-available TPU v3, however, our most expensive model achieves 4.4 frames/sec on a single TPU core, and 17.8 frames/sec for the lightweight model.
Software Dependencies	No	All experiments were conducted using JAX (Bradbury et al., 2018) and the Deep Mind JAX ecosystem (Babuschkin et al., 2020). ... An efﬁcient Tensorﬂow implementation of RAFT (Sun et al., 2020) (received courtesy of the authors) achieves only 1.6 frames/sec on the same hardware. The paper mentions software like JAX and TensorFlow but does not provide specific version numbers.
Experiment Setup	Yes	We ﬁnetune Perceiver IO on the GLUE Benchmark Wang et al. (2019), reporting the best performance on the dev set for a ﬁxed size sweep of ﬁnetuning hyperparameters. ... We use LAMB with a simple learning rate schedule consisting of a ﬂat learning rate of 2 10 3 for 55 epochs, after which the learning rate is decayed to 0 over the ﬁnal 55 epochs following a cosine decay schedule (Loshchilov & Hutter, 2017). We use a batch size of 1024 and 64 TPUs.