Full-Capacity Unitary Recurrent Neural Networks
Authors: Scott Wisdom, Thomas Powers, John Hershey, Jonathan Le Roux, Les Atlas
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We confirm the utility of our claims by empirically evaluating our new full-capacity u RNNs on both synthetic and natural data, achieving superior performance compared to both LSTMs and the original restricted-capacity u RNNs. |
| Researcher Affiliation | Collaboration | 1 Department of Electrical Engineering, University of Washington {swisdom, tcpowers, atlas}@uw.edu 2 Mitsubishi Electric Research Laboratories (MERL) {hershey, leroux}@merl.com |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | All code to replicate our results is available from https://github.com/stwisdom/urnn. |
| Open Datasets | Yes | We use the TIMIT dataset [17]. ... For the task of system identification, we consider the problem of learning the dynamics of a nonlinear dynamical system that has the form (1), given a dataset of inputs and outputs of the system. ... pixel-by-pixel MNIST and permuted pixel-by-pixel MNIST |
| Dataset Splits | Yes | For all experiments, the number of training, validation, and test sequences are 20000, 1000, and 1000, respectively. ... According to common practice [18], we use a training set with 3690 utterances from 462 speakers, a validation set of 400 utterances, an evaluation set of 192 utterances. ... We use 5000 of the 60000 training examples as a validation set to perform early stopping with a patience of 5. |
| Hardware Specification | No | The paper does not provide specific details about the hardware (e.g., GPU model, CPU type, memory) used for the experiments. It only mentions 'All models are implemented in Theano'. |
| Software Dependencies | No | The paper mentions 'All models are implemented in Theano [16]' but does not provide a specific version number for Theano or any other software libraries, which is necessary for reproducible setup. |
| Experiment Setup | Yes | The learning rate is 0.001 with a batch size of 50 for all experiments. ... The full-capacity u RNN uses a hidden state size of N = 128 with no gradient normalization. To match the number of parameters ( 22k), we use N = 470 for the restricted-capacity u RNN, and N = 68 for the LSTM. ... For the LSTM and restricted-capacity u RNNs, we use RMSprop [15] with a learning rate of 0.001, momentum 0.9, and averaging parameter 0.1. For the full-capacity u RNN, we also use RMSprop to optimize all network parameters, except for the recurrence matrix, for which we use stochastic gradient descent along the Stiefel manifold using the update (6) with a fixed learning rate of 0.001 and no gradient normalization. ... We use 5000 of the 60000 training examples as a validation set to perform early stopping with a patience of 5. The loss function is cross-entropy. |