Parallelizing Linear Recurrent Neural Nets Over Sequence Length
Authors: Eric Martin, Chris Cundy
ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We develop a parallel linear recurrence CUDA kernel and show that it can be applied to immediately speed up training and inference of several state of the art RNN architectures by up to 9x. We abstract recent work on linear RNNs into a new framework of linear surrogate RNNs and develop a linear surrogate model for the long short-term memory unit, the GILR-LSTM, that utilizes parallel linear recurrence. We extend sequence learning to new extremely long sequence regimes that were previously out of reach by successfully training a GILR-LSTM on a synthetic sequence classification task with a one million timestep dependency. |
| Researcher Affiliation | Academia | Eric Martin eric@ericmart.in Chris Cundy Department of Computer Science University of California, Berkeley Berkeley, CA 94720, USA c.cundy@berkeley.edu Currently at the Future of Humanity Institute, University of Oxford, Oxford, UK |
| Pseudocode | Yes | Algorithm 1 Parallel linear recurrence on p processors |
| Open Source Code | Yes | The parallel linear recurrence CUDA kernel and Tensor Flow bindings are available at https: //github.com/eamartin/parallelizing_linear_rnns . |
| Open Datasets | No | The paper describes a synthetic dataset generated for the experiment but does not provide access information (link, DOI, specific citation) for a publicly available or open dataset. "The input consists of sequences of length n where for n > 0 each element is a randomly chosen one-hot vector x in p-dimensional space." |
| Dataset Splits | No | No specific training/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits) are provided. The paper states: "We continually generated random sequences to serve as input data." |
| Hardware Specification | Yes | We ran all experiments on a NVIDIA K80 GPU |
| Software Dependencies | No | The paper mentions "Tensor Flow" but does not provide specific version numbers for it or any other software dependencies. |
| Experiment Setup | Yes | We controlled for GPU memory usage within these experiments by fixing b T = 65, 536 for minibatch size b and sequence length T, and chose a popular architecture consisting of two stacked RNN layers with 256 hidden units and an input size of 4. ... A brief search over learning rate and batch size was carried out to find the parameters which allow the network to converge most rapidly for all runs. |