Parallelizing Linear Recurrent Neural Nets Over Sequence Length

Authors: Eric Martin, Chris Cundy

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We develop a parallel linear recurrence CUDA kernel and show that it can be applied to immediately speed up training and inference of several state of the art RNN architectures by up to 9x. We abstract recent work on linear RNNs into a new framework of linear surrogate RNNs and develop a linear surrogate model for the long short-term memory unit, the GILR-LSTM, that utilizes parallel linear recurrence. We extend sequence learning to new extremely long sequence regimes that were previously out of reach by successfully training a GILR-LSTM on a synthetic sequence classification task with a one million timestep dependency.
Researcher Affiliation Academia Eric Martin eric@ericmart.in Chris Cundy Department of Computer Science University of California, Berkeley Berkeley, CA 94720, USA c.cundy@berkeley.edu Currently at the Future of Humanity Institute, University of Oxford, Oxford, UK
Pseudocode Yes Algorithm 1 Parallel linear recurrence on p processors
Open Source Code Yes The parallel linear recurrence CUDA kernel and Tensor Flow bindings are available at https: //github.com/eamartin/parallelizing_linear_rnns .
Open Datasets No The paper describes a synthetic dataset generated for the experiment but does not provide access information (link, DOI, specific citation) for a publicly available or open dataset. "The input consists of sequences of length n where for n > 0 each element is a randomly chosen one-hot vector x in p-dimensional space."
Dataset Splits No No specific training/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits) are provided. The paper states: "We continually generated random sequences to serve as input data."
Hardware Specification Yes We ran all experiments on a NVIDIA K80 GPU
Software Dependencies No The paper mentions "Tensor Flow" but does not provide specific version numbers for it or any other software dependencies.
Experiment Setup Yes We controlled for GPU memory usage within these experiments by fixing b T = 65, 536 for minibatch size b and sequence length T, and chose a popular architecture consisting of two stacked RNN layers with 256 hidden units and an input size of 4. ... A brief search over learning rate and batch size was carried out to find the parameters which allow the network to converge most rapidly for all runs.