reproducibilityindex.ai

Resurrecting Recurrent Neural Networks for Long Sequences

Authors: Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, Soham De

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that careful design of deep RNNs using standard signal propagation arguments can recover the impressive performance of deep SSMs on long-range reasoning tasks, while matching their training speed. To achieve this, we analyze and ablate a series of changes to standard RNNs including linearizing and diagonalizing the recurrence, using better parameterizations and initializations, and ensuring careful normalization of the forward pass. Our results provide new insights on the origins of the impressive performance of deep SSMs, and introduce an RNN block called the Linear Recurrent Unit (or LRU) that matches both their performance on the Long Range Arena benchmark and their computational efficiency.
Researcher Affiliation	Collaboration	*Work done at Deep Mind. 1Department of Computer Science, ETH Zurich, Switzerland. 2Deep Mind, London, United Kingdom.. Correspondence to: Antonio Orvieto <antonio.orvieto@inf.ethz.ch>, Soham De <sohamde@google.com>.
Pseudocode	Yes	A. Simplified Implementation of the Linear Recurrent Unit We present here a simplified JAX implementation (Bradbury et al., 2018) of the Linear Recurrent Unit (LRU).
Open Source Code	Yes	A. Simplified Implementation of the Linear Recurrent Unit We present here a simplified JAX implementation (Bradbury et al., 2018) of the Linear Recurrent Unit (LRU).
Open Datasets	Yes	We consider the Long Range Arena benchmark (Tay et al., 2020), a set of tasks designed to test the ability of models to do long-range sequence modelling (we use coloured images instead of grayscale images for the sequential CIFAR-10 classification task).
Dataset Splits	Yes	We consider the Long Range Arena benchmark (Tay et al., 2020), a set of tasks designed to test the ability of models to do long-range sequence modelling (we use coloured images instead of grayscale images for the sequential CIFAR-10 classification task).
Hardware Specification	Yes	Speeds (steps/sec) during training on a A100 GPU.
Software Dependencies	No	We present here a simplified JAX implementation (Bradbury et al., 2018) of the Linear Recurrent Unit (LRU).
Experiment Setup	Yes	Table 10. List of all the hyper-parameters used for each task for the LRU model.