Unitary Evolution Recurrent Neural Networks
Authors: Martin Arjovsky, Amar Shah, Yoshua Bengio
ICML 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate the potential of this architecture by achieving state of the art results in several hard tasks involving very longterm dependencies. and In this section we explore the performance of our u RNN in relation to (a) RNN with tanh activations, (b) IRNN (Le et al., 2015), that is an RNN with Re LU activations and with the recurrent weight matrix initialized to the identity, and (c) LSTM (Hochreiter & Schmidhuber, 1997) models. |
| Researcher Affiliation | Academia | Martin Arjovsky MARJOVSKY@DC.UBA.AR Amar Shah AS793@CAM.AC.UK Yoshua Bengio Universidad de Buenos Aires, University of Cambridge, Universit e de Montr eal. Yoshua Bengio is a CIFAR Senior Fellow. |
| Pseudocode | No | The paper describes the architecture and mathematical operations but does not include structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | This along with other implementation details are discussed in Section 4, and the code used for the experiments is available online. |
| Open Datasets | Yes | We chose a handful of tasks to evaluate the performance of the various models. The tasks were especially created to be be pathologically hard, and have been used as benchmarks for testing the ability of a model to capture long-term memory (Hochreiter & Schmidhuber, 1997; Le et al., 2015; Graves et al., 2014; Martens & Sutskever, 2011) and Pixel-by-pixel MNIST from (Le Cun et al., 1998). |
| Dataset Splits | No | The paper discusses training and testing performance but does not provide specific details on validation dataset splits, percentages, or cross-validation methodology. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU/GPU models, memory specifications) used for running its experiments. It mentions 'GPU memory' generally but not specific hardware used. |
| Software Dependencies | No | The paper mentions using 'Theano' but does not specify its version number or any other software dependencies with their respective versions, which is necessary for reproducible ancillary software details. |
| Experiment Setup | Yes | In each experiment we use a learning rate of 10 3 and a decay rate of 0.9. For the LSTM and RNN models, we had to clip gradients at 1 to avoid exploding gradients. and We initialize V and U (the input and output matrices) as in (Glorot & Bengio, 2010), with weights sampled independently from uniforms, U h 6 nin+nout , and The biases, b and bo are initialized to 0. and The diagonal weights for D1, D2 and D3 are sampled from a uniform, U[ π, π]. |