Lipschitz Recurrent Neural Networks

Authors: N. Benjamin Erichson, Omri Azencot, Alejandro Queiruga, Liam Hodgkinson, Michael W. Mahoney

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments demonstrate that the Lipschitz RNN can outperform existing recurrent units on a range of benchmark tasks, including computer vision, language modeling and speech prediction tasks.
Researcher Affiliation Collaboration N. Benjamin Erichson ICSI and UC Berkeley erichson@berkeley.edu Omri Azencot Ben-Gurion University azencot@cs.bgu.ac.il Alejandro Queiruga Google Research afq@google.com Liam Hodgkinson ICSI and UC Berkeley liam.hodgkinson@berkeley.edu Michael W. Mahoney ICSI and UC Berkeley mmahoney@stat.berkeley.edu
Pseudocode No The paper describes the proposed model and methods using mathematical equations and textual descriptions, but it does not include any structured pseudocode or algorithm blocks.
Open Source Code Yes Research code is shared via github.com/erichson/Lipschitz RNN.
Open Datasets Yes The model is applied to ordered and permuted pixel-by-pixel MNIST classification, as well as to audio data using the TIMIT dataset. ... Next, we consider the TIMIT dataset (Garofolo, 1993)... ... Penn Tree Bank (PTB) (Marcus et al., 1993).
Dataset Splits Yes To compare our results with those of other models, we used the common train / validation / test split: 3690 utterances from 462 speakers for training, 192 utterances for validation, and 400 utterances for testing. ... The dataset is composed of a train / validation / test set, where 5017K characters are used for training, 393K characters are used for validation and 442K characters are used for testing.
Hardware Specification No The paper mentions support from 'Amazon AWS and Google Cloud' but does not provide specific hardware details such as GPU/CPU models, memory, or processor types used for running experiments.
Software Dependencies No The paper mentions using 'Py Hessian (Yao et al., 2019)' but does not provide specific version numbers for this or any other software libraries or dependencies used for the experiments.
Experiment Setup Yes For tuning we utilized a standard training procedure using a non-exhaustive random search within the following plausible ranges for the our weight parameterization β = 0.65, 0.7, 0.75, 0.8, γ = [0.001, 1.0]. For Adam we explored learning rates between 0.001 and 0.005, and for SGD we considered 0.1. For the step size we explored values in the range 0.001 to 1.0. ... Table 8: Tuning parameters used for our experimental results and the performance evaluated with 12 different seed values for the parameter initialization of the model.