Long Expressive Memory for Sequence Modeling
Authors: T. Konstantin Rusch, Siddhartha Mishra, N. Benjamin Erichson, Michael W. Mahoney
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results, ranging from image and time-series classification through dynamical systems prediction to keyword spotting and language modeling, demonstrate that LEM outperforms state-of-the-art recurrent neural networks, gated recurrent units, and long short-term memory models. We provide an extensive empirical evaluation of LEM on a wide variey of data sets, including image and sequence classification, dynamical systems prediction, keyword spotting, and language modeling, thereby demonstrating that LEM outperforms or is comparable to state-of-the-art RNNs, GRUs and LSTMs in each task (Section 5). |
| Researcher Affiliation | Academia | T. Konstantin Rusch ETH Z urich trusch@ethz.ch Siddhartha Mishra ETH Z urich smishra@ethz.ch N. Benjamin Erichson University of Pittsburgh erichson@pitt.edu Michael W. Mahoney ICSI and UC Berkeley mmahoney@stat.berkeley.edu |
| Pseudocode | No | The paper presents mathematical equations and formulas but no clearly labeled "Pseudocode" or "Algorithm" block. |
| Open Source Code | Yes | All code to reproduce our results can be found at https://github.com/tk-rusch/LEM. |
| Open Datasets | Yes | We consider three experiments based on two widely-used image recognition data sets, i.e., MNIST (Le Cun et al., 1998) and CIFAR-10 (Krizhevsky et al., 2009)... The Google Speech Commands data set V2 (Warden, 2018)... Penn Treebank (PTB) corpus (Marcus et al., 1993), preprocessed by Mikolov et al. (2010). |
| Dataset Splits | Yes | Following Morrill et al. (2021) and Rusch & Mishra (2021b), we divide the data into a train, validation and test set according to a 70%, 15%, 15% ratio. |
| Hardware Specification | Yes | All experiments were run on CPU, namely Intel Xeon Gold 5118 and AMD EPYC 7H12, except for Google12, PTB character-level and PTB word-level, which were run on a Ge Force RTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions "language modelling code: https://github.com/deepmind/lamb" but does not list specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9). |
| Experiment Setup | Yes | Details of the training procedure for each experiment can be found in SM A. The hyperparameters are selected based on a random search algorithm, where we present the rounded hyperparameters for the best performing LEM model (based on a validation set) on each task in Table 8. |