Resurrecting Recurrent Neural Networks for Long Sequences
Authors: Antonio Orvieto, Samuel L Smith, Albert Gu, Anushan Fernando, Caglar Gulcehre, Razvan Pascanu, Soham De
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that careful design of deep RNNs using standard signal propagation arguments can recover the impressive performance of deep SSMs on long-range reasoning tasks, while matching their training speed. To achieve this, we analyze and ablate a series of changes to standard RNNs including linearizing and diagonalizing the recurrence, using better parameterizations and initializations, and ensuring careful normalization of the forward pass. Our results provide new insights on the origins of the impressive performance of deep SSMs, and introduce an RNN block called the Linear Recurrent Unit (or LRU) that matches both their performance on the Long Range Arena benchmark and their computational efficiency. |
| Researcher Affiliation | Collaboration | *Work done at Deep Mind. 1Department of Computer Science, ETH Zurich, Switzerland. 2Deep Mind, London, United Kingdom.. Correspondence to: Antonio Orvieto <antonio.orvieto@inf.ethz.ch>, Soham De <sohamde@google.com>. |
| Pseudocode | Yes | A. Simplified Implementation of the Linear Recurrent Unit We present here a simplified JAX implementation (Bradbury et al., 2018) of the Linear Recurrent Unit (LRU). |
| Open Source Code | Yes | A. Simplified Implementation of the Linear Recurrent Unit We present here a simplified JAX implementation (Bradbury et al., 2018) of the Linear Recurrent Unit (LRU). |
| Open Datasets | Yes | We consider the Long Range Arena benchmark (Tay et al., 2020), a set of tasks designed to test the ability of models to do long-range sequence modelling (we use coloured images instead of grayscale images for the sequential CIFAR-10 classification task). |
| Dataset Splits | Yes | We consider the Long Range Arena benchmark (Tay et al., 2020), a set of tasks designed to test the ability of models to do long-range sequence modelling (we use coloured images instead of grayscale images for the sequential CIFAR-10 classification task). |
| Hardware Specification | Yes | Speeds (steps/sec) during training on a A100 GPU. |
| Software Dependencies | No | We present here a simplified JAX implementation (Bradbury et al., 2018) of the Linear Recurrent Unit (LRU). |
| Experiment Setup | Yes | Table 10. List of all the hyper-parameters used for each task for the LRU model. |