A Max-Affine Spline Perspective of Recurrent Neural Networks

Authors: Zichao Wang, Randall Balestriero, Richard Baraniuk

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on several datasets of various modalities demonstrate and validate each of the above conclusions.
Researcher Affiliation Academia Zichao Wang, Randall Balestriero & Richard G. Baraniuk Department of Electrical and Computer Engineering Rice University Houston, TX 77005, USA {zw16,rb42,richb}@rice.edu
Pseudocode No The paper provides mathematical derivations and theorems but does not include any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not include any explicit statements about releasing source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes MNIST. The dataset6 consists of 60k images in the training set and 10k images in the test set. We randomly select 10k images from the training set as validation set. ... SST-2. The dataset7 consists of 6920, 872, 1821 sentences in the training, validation and test set, respectively. ... Bird Audio Dataset. The dataset8 consists of 7,000 field recording signals of 10 seconds sampled at 44 k Hz from the Freesound Stowell & Plumbley (2014) audio archive
Dataset Splits Yes MNIST. The dataset6 consists of 60k images in the training set and 10k images in the test set. We randomly select 10k images from the training set as validation set. ... SST-2. The dataset7 consists of 6920, 872, 1821 sentences in the training, validation and test set, respectively.
Hardware Specification No The paper does not specify any hardware details such as GPU or CPU models used for the experiments. It only refers to running 'experiments' in general terms.
Software Dependencies No The paper mentions 'Py Torch default values' for MNIST preprocessing and 'python sklearn package' for t-SNE visualization, but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Experiment setup for various datasets is summarized in Table 2. Some of the experiments do not appear in the main text but in the appendix; we include setup for those experiments as well. A setting common to all experiments is that we use learning rate scheduler so that when validation loss plateaus for 5 consecutive epochs, we reduce the current learning rate by a factor of 0.7. ... In all experiments we use Re LU RNNs with 128-dimensional hidden states and with the recurrent weight matrix W (ℓ) r initialized as an identity matrix (Le et al., 2015; Talathi & Vartak, 2016). ... We use σϵ chosen in {0.001, 0.01, 0.1, 1, 5} and learning rates in {1e-5, 1e-4, 1.5e-4, 2e-4} for RMSprop and {1e-7, 1e-6, 1.25e-6, 1.5e-6} plain SGD.