Long Expressive Memory for Sequence Modeling

Authors: T. Konstantin Rusch, Siddhartha Mishra, N. Benjamin Erichson, Michael W. Mahoney

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results, ranging from image and time-series classification through dynamical systems prediction to keyword spotting and language modeling, demonstrate that LEM outperforms state-of-the-art recurrent neural networks, gated recurrent units, and long short-term memory models. We provide an extensive empirical evaluation of LEM on a wide variey of data sets, including image and sequence classification, dynamical systems prediction, keyword spotting, and language modeling, thereby demonstrating that LEM outperforms or is comparable to state-of-the-art RNNs, GRUs and LSTMs in each task (Section 5).
Researcher Affiliation Academia T. Konstantin Rusch ETH Z urich trusch@ethz.ch Siddhartha Mishra ETH Z urich smishra@ethz.ch N. Benjamin Erichson University of Pittsburgh erichson@pitt.edu Michael W. Mahoney ICSI and UC Berkeley mmahoney@stat.berkeley.edu
Pseudocode No The paper presents mathematical equations and formulas but no clearly labeled "Pseudocode" or "Algorithm" block.
Open Source Code Yes All code to reproduce our results can be found at https://github.com/tk-rusch/LEM.
Open Datasets Yes We consider three experiments based on two widely-used image recognition data sets, i.e., MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky et al., 2009)... The Google Speech Commands data set V2 (Warden, 2018)... Penn Treebank (PTB) corpus (Marcus et al., 1993), preprocessed by Mikolov et al. (2010).
Dataset Splits Yes Following Morrill et al. (2021) and Rusch & Mishra (2021b), we divide the data into a train, validation and test set according to a 70%, 15%, 15% ratio.
Hardware Specification Yes All experiments were run on CPU, namely Intel Xeon Gold 5118 and AMD EPYC 7H12, except for Google12, PTB character-level and PTB word-level, which were run on a Ge Force RTX 2080 Ti GPU.
Software Dependencies No The paper mentions "language modelling code: https://github.com/deepmind/lamb" but does not list specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup Yes Details of the training procedure for each experiment can be found in SM A. The hyperparameters are selected based on a random search algorithm, where we present the rounded hyperparameters for the best performing LEM model (based on a validation set) on each task in Table 8.