reproducibilityindex.ai

Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks

Authors: Aaron Voelker, Ivana Kajić, Chris Eliasmith

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Backpropagation across LMUs outperforms equivalently-sized LSTMs on a chaotic time-series prediction task, improves memory capacity by two orders of magnitude, and signiﬁcantly reduces training and inference times. LMUs can efﬁciently handle temporal dependencies spanning 100,000 time-steps, converge rapidly, and use few internal state-variables to learn complex functions spanning long windows of time exceeding state-of-the-art performance among RNNs on permuted sequential MNIST. These results are due to the network s disposition to learn scale-invariant features independently of step size. Backpropagation through the ODE solver allows each layer to adapt its internal time-step, enabling the network to learn task-relevant time-scales. We demonstrate that LMU memory cells can be implemented using m recurrently-connected Poisson spiking neurons, O(m) time and memory, with error scaling as O(d/ m). We discuss implementations of LMUs on analog and digital neuromorphic hardware.
Researcher Affiliation	Collaboration	Aaron R. Voelker1,2 Ivana Kaji c1 Chris Eliasmith1,2 1Centre for Theoretical Neuroscience, Waterloo, ON 2Applied Brain Research, Inc. {arvoelke, i2kajic, celiasmith}@uwaterloo.ca
Pseudocode	No	The paper provides mathematical derivations and diagrams (e.g., Figure 2, Figure 6) but does not include any explicit pseudocode or algorithm blocks.
Open Source Code	Yes	The source code for the LMU and our experiments are published on Git Hub.1 1https://github.com/abr/neurips2019
Open Datasets	Yes	The permuted sequential MNIST (ps MNIST) digit classiﬁcation task [22] is commonly used to assess the ability of RNN models to learn complex temporal relationships [2, 7, 8, 21, 25].
Dataset Splits	Yes	For the LMU and feed-forward baseline, we extended the code from Chandar et al. [7] in order to ensure that the training, validation, and test data were identical with the same permutation seed and batch size.
Hardware Specification	No	The paper mentions that models 'run on CPUs and GPUs' but does not specify any particular CPU or GPU models, processor types, or memory details used for the experiments.
Software Dependencies	No	The paper mentions 'Keras and the Tensor Flow backend [1]' as software used, but does not provide specific version numbers for these dependencies.
Experiment Setup	Yes	The activation function f is set to tanh. All models are implemented with Keras and the Tensor Flow backend [1] and run on CPUs and GPUs. We use the Adam optimizer [20] with default hyperparameters, monitor the validation loss to save the best model, and train until convergence or 500 epochs. We note that our method does not require layer normalization, gradient clipping, or other regularization techniques.