Resurrecting Recurrent Neural Networks for Long Sequences

Authors: Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, Soham De

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show that careful design of deep RNNs using standard signal propagation arguments can recover the impressive performance of deep SSMs on long-range reasoning tasks, while matching their training speed. To achieve this, we analyze and ablate a series of changes to standard RNNs including linearizing and diagonalizing the recurrence, using better parameterizations and initializations, and ensuring careful normalization of the forward pass. Our results provide new insights on the origins of the impressive performance of deep SSMs, and introduce an RNN block called the Linear Recurrent Unit (or LRU) that matches both their performance on the Long Range Arena benchmark and their computational efficiency.
Researcher Affiliation Collaboration *Work done at Deep Mind. 1Department of Computer Science, ETH Zurich, Switzerland. 2Deep Mind, London, United Kingdom.. Correspondence to: Antonio Orvieto <antonio.orvieto@inf.ethz.ch>, Soham De <sohamde@google.com>.
Pseudocode Yes A. Simplified Implementation of the Linear Recurrent Unit We present here a simplified JAX implementation (Bradbury et al., 2018) of the Linear Recurrent Unit (LRU).
Open Source Code Yes A. Simplified Implementation of the Linear Recurrent Unit We present here a simplified JAX implementation (Bradbury et al., 2018) of the Linear Recurrent Unit (LRU).
Open Datasets Yes We consider the Long Range Arena benchmark (Tay et al., 2020), a set of tasks designed to test the ability of models to do long-range sequence modelling (we use coloured images instead of grayscale images for the sequential CIFAR-10 classification task).
Dataset Splits Yes We consider the Long Range Arena benchmark (Tay et al., 2020), a set of tasks designed to test the ability of models to do long-range sequence modelling (we use coloured images instead of grayscale images for the sequential CIFAR-10 classification task).
Hardware Specification Yes Speeds (steps/sec) during training on a A100 GPU.
Software Dependencies No We present here a simplified JAX implementation (Bradbury et al., 2018) of the Linear Recurrent Unit (LRU).
Experiment Setup Yes Table 10. List of all the hyper-parameters used for each task for the LRU model.