A Max-Affine Spline Perspective of Recurrent Neural Networks
Authors: Zichao Wang, Randall Balestriero, Richard Baraniuk
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Extensive experiments on several datasets of various modalities demonstrate and validate each of the above conclusions. |
| Researcher Affiliation | Academia | Zichao Wang, Randall Balestriero & Richard G. Baraniuk Department of Electrical and Computer Engineering Rice University Houston, TX 77005, USA {zw16,rb42,richb}@rice.edu |
| Pseudocode | No | The paper provides mathematical derivations and theorems but does not include any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not include any explicit statements about releasing source code for the described methodology, nor does it provide a link to a code repository. |
| Open Datasets | Yes | MNIST. The dataset6 consists of 60k images in the training set and 10k images in the test set. We randomly select 10k images from the training set as validation set. ... SST-2. The dataset7 consists of 6920, 872, 1821 sentences in the training, validation and test set, respectively. ... Bird Audio Dataset. The dataset8 consists of 7,000 field recording signals of 10 seconds sampled at 44 k Hz from the Freesound Stowell & Plumbley (2014) audio archive |
| Dataset Splits | Yes | MNIST. The dataset6 consists of 60k images in the training set and 10k images in the test set. We randomly select 10k images from the training set as validation set. ... SST-2. The dataset7 consists of 6920, 872, 1821 sentences in the training, validation and test set, respectively. |
| Hardware Specification | No | The paper does not specify any hardware details such as GPU or CPU models used for the experiments. It only refers to running 'experiments' in general terms. |
| Software Dependencies | No | The paper mentions 'Py Torch default values' for MNIST preprocessing and 'python sklearn package' for t-SNE visualization, but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Experiment setup for various datasets is summarized in Table 2. Some of the experiments do not appear in the main text but in the appendix; we include setup for those experiments as well. A setting common to all experiments is that we use learning rate scheduler so that when validation loss plateaus for 5 consecutive epochs, we reduce the current learning rate by a factor of 0.7. ... In all experiments we use Re LU RNNs with 128-dimensional hidden states and with the recurrent weight matrix W (ℓ) r initialized as an identity matrix (Le et al., 2015; Talathi & Vartak, 2016). ... We use σϵ chosen in {0.001, 0.01, 0.1, 1, 5} and learning rates in {1e-5, 1e-4, 1.5e-4, 2e-4} for RMSprop and {1e-7, 1e-6, 1.25e-6, 1.5e-6} plain SGD. |