Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks
Authors: Aaron Voelker, Ivana Kajić, Chris Eliasmith
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Backpropagation across LMUs outperforms equivalently-sized LSTMs on a chaotic time-series prediction task, improves memory capacity by two orders of magnitude, and significantly reduces training and inference times. LMUs can efficiently handle temporal dependencies spanning 100,000 time-steps, converge rapidly, and use few internal state-variables to learn complex functions spanning long windows of time exceeding state-of-the-art performance among RNNs on permuted sequential MNIST. These results are due to the network s disposition to learn scale-invariant features independently of step size. Backpropagation through the ODE solver allows each layer to adapt its internal time-step, enabling the network to learn task-relevant time-scales. We demonstrate that LMU memory cells can be implemented using m recurrently-connected Poisson spiking neurons, O(m) time and memory, with error scaling as O(d/ m). We discuss implementations of LMUs on analog and digital neuromorphic hardware. |
| Researcher Affiliation | Collaboration | Aaron R. Voelker1,2 Ivana Kaji c1 Chris Eliasmith1,2 1Centre for Theoretical Neuroscience, Waterloo, ON 2Applied Brain Research, Inc. {arvoelke, i2kajic, celiasmith}@uwaterloo.ca |
| Pseudocode | No | The paper provides mathematical derivations and diagrams (e.g., Figure 2, Figure 6) but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | Yes | The source code for the LMU and our experiments are published on Git Hub.1 1https://github.com/abr/neurips2019 |
| Open Datasets | Yes | The permuted sequential MNIST (ps MNIST) digit classification task [22] is commonly used to assess the ability of RNN models to learn complex temporal relationships [2, 7, 8, 21, 25]. |
| Dataset Splits | Yes | For the LMU and feed-forward baseline, we extended the code from Chandar et al. [7] in order to ensure that the training, validation, and test data were identical with the same permutation seed and batch size. |
| Hardware Specification | No | The paper mentions that models 'run on CPUs and GPUs' but does not specify any particular CPU or GPU models, processor types, or memory details used for the experiments. |
| Software Dependencies | No | The paper mentions 'Keras and the Tensor Flow backend [1]' as software used, but does not provide specific version numbers for these dependencies. |
| Experiment Setup | Yes | The activation function f is set to tanh. All models are implemented with Keras and the Tensor Flow backend [1] and run on CPUs and GPUs. We use the Adam optimizer [20] with default hyperparameters, monitor the validation loss to save the best model, and train until convergence or 500 epochs. We note that our method does not require layer normalization, gradient clipping, or other regularization techniques. |