reproducibilityindex.ai

Parallelizing Linear Recurrent Neural Nets Over Sequence Length

Authors: Eric Martin, Chris Cundy

ICLR 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We develop a parallel linear recurrence CUDA kernel and show that it can be applied to immediately speed up training and inference of several state of the art RNN architectures by up to 9x. We abstract recent work on linear RNNs into a new framework of linear surrogate RNNs and develop a linear surrogate model for the long short-term memory unit, the GILR-LSTM, that utilizes parallel linear recurrence. We extend sequence learning to new extremely long sequence regimes that were previously out of reach by successfully training a GILR-LSTM on a synthetic sequence classiﬁcation task with a one million timestep dependency.
Researcher Affiliation	Academia	Eric Martin eric@ericmart.in Chris Cundy Department of Computer Science University of California, Berkeley Berkeley, CA 94720, USA c.cundy@berkeley.edu Currently at the Future of Humanity Institute, University of Oxford, Oxford, UK
Pseudocode	Yes	Algorithm 1 Parallel linear recurrence on p processors
Open Source Code	Yes	The parallel linear recurrence CUDA kernel and Tensor Flow bindings are available at https: //github.com/eamartin/parallelizing_linear_rnns .
Open Datasets	No	The paper describes a synthetic dataset generated for the experiment but does not provide access information (link, DOI, specific citation) for a publicly available or open dataset. "The input consists of sequences of length n where for n > 0 each element is a randomly chosen one-hot vector x in p-dimensional space."
Dataset Splits	No	No specific training/validation/test dataset splits (e.g., percentages, sample counts, or references to predefined splits) are provided. The paper states: "We continually generated random sequences to serve as input data."
Hardware Specification	Yes	We ran all experiments on a NVIDIA K80 GPU
Software Dependencies	No	The paper mentions "Tensor Flow" but does not provide specific version numbers for it or any other software dependencies.
Experiment Setup	Yes	We controlled for GPU memory usage within these experiments by ﬁxing b T = 65, 536 for minibatch size b and sequence length T, and chose a popular architecture consisting of two stacked RNN layers with 256 hidden units and an input size of 4. ... A brief search over learning rate and batch size was carried out to ﬁnd the parameters which allow the network to converge most rapidly for all runs.