Full-Capacity Unitary Recurrent Neural Networks

Authors: Scott Wisdom, Thomas Powers, John Hershey, Jonathan Le Roux, Les Atlas

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We confirm the utility of our claims by empirically evaluating our new full-capacity u RNNs on both synthetic and natural data, achieving superior performance compared to both LSTMs and the original restricted-capacity u RNNs.
Researcher Affiliation Collaboration 1 Department of Electrical Engineering, University of Washington {swisdom, tcpowers, atlas}@uw.edu 2 Mitsubishi Electric Research Laboratories (MERL) {hershey, leroux}@merl.com
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code Yes All code to replicate our results is available from https://github.com/stwisdom/urnn.
Open Datasets Yes We use the TIMIT dataset [17]. ... For the task of system identification, we consider the problem of learning the dynamics of a nonlinear dynamical system that has the form (1), given a dataset of inputs and outputs of the system. ... pixel-by-pixel MNIST and permuted pixel-by-pixel MNIST
Dataset Splits Yes For all experiments, the number of training, validation, and test sequences are 20000, 1000, and 1000, respectively. ... According to common practice [18], we use a training set with 3690 utterances from 462 speakers, a validation set of 400 utterances, an evaluation set of 192 utterances. ... We use 5000 of the 60000 training examples as a validation set to perform early stopping with a patience of 5.
Hardware Specification No The paper does not provide specific details about the hardware (e.g., GPU model, CPU type, memory) used for the experiments. It only mentions 'All models are implemented in Theano'.
Software Dependencies No The paper mentions 'All models are implemented in Theano [16]' but does not provide a specific version number for Theano or any other software libraries, which is necessary for reproducible setup.
Experiment Setup Yes The learning rate is 0.001 with a batch size of 50 for all experiments. ... The full-capacity u RNN uses a hidden state size of N = 128 with no gradient normalization. To match the number of parameters ( 22k), we use N = 470 for the restricted-capacity u RNN, and N = 68 for the LSTM. ... For the LSTM and restricted-capacity u RNNs, we use RMSprop [15] with a learning rate of 0.001, momentum 0.9, and averaging parameter 0.1. For the full-capacity u RNN, we also use RMSprop to optimize all network parameters, except for the recurrence matrix, for which we use stochastic gradient descent along the Stiefel manifold using the update (6) with a fixed learning rate of 0.001 and no gradient normalization. ... We use 5000 of the 60000 training examples as a validation set to perform early stopping with a patience of 5. The loss function is cross-entropy.