Lipschitz Recurrent Neural Networks
Authors: N. Benjamin Erichson, Omri Azencot, Alejandro Queiruga, Liam Hodgkinson, Michael W. Mahoney
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments demonstrate that the Lipschitz RNN can outperform existing recurrent units on a range of benchmark tasks, including computer vision, language modeling and speech prediction tasks. |
| Researcher Affiliation | Collaboration | N. Benjamin Erichson ICSI and UC Berkeley erichson@berkeley.edu Omri Azencot Ben-Gurion University azencot@cs.bgu.ac.il Alejandro Queiruga Google Research afq@google.com Liam Hodgkinson ICSI and UC Berkeley liam.hodgkinson@berkeley.edu Michael W. Mahoney ICSI and UC Berkeley mmahoney@stat.berkeley.edu |
| Pseudocode | No | The paper describes the proposed model and methods using mathematical equations and textual descriptions, but it does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Research code is shared via github.com/erichson/Lipschitz RNN. |
| Open Datasets | Yes | The model is applied to ordered and permuted pixel-by-pixel MNIST classification, as well as to audio data using the TIMIT dataset. ... Next, we consider the TIMIT dataset (Garofolo, 1993)... ... Penn Tree Bank (PTB) (Marcus et al., 1993). |
| Dataset Splits | Yes | To compare our results with those of other models, we used the common train / validation / test split: 3690 utterances from 462 speakers for training, 192 utterances for validation, and 400 utterances for testing. ... The dataset is composed of a train / validation / test set, where 5017K characters are used for training, 393K characters are used for validation and 442K characters are used for testing. |
| Hardware Specification | No | The paper mentions support from 'Amazon AWS and Google Cloud' but does not provide specific hardware details such as GPU/CPU models, memory, or processor types used for running experiments. |
| Software Dependencies | No | The paper mentions using 'Py Hessian (Yao et al., 2019)' but does not provide specific version numbers for this or any other software libraries or dependencies used for the experiments. |
| Experiment Setup | Yes | For tuning we utilized a standard training procedure using a non-exhaustive random search within the following plausible ranges for the our weight parameterization β = 0.65, 0.7, 0.75, 0.8, γ = [0.001, 1.0]. For Adam we explored learning rates between 0.001 and 0.005, and for SGD we considered 0.1. For the step size we explored values in the range 0.001 to 1.0. ... Table 8: Tuning parameters used for our experimental results and the performance evaluated with 12 different seed values for the parameter initialization of the model. |