Orthogonal Recurrent Neural Networks with Scaled Cayley Transform
Authors: Kyle Helfrich, Devin Willmott, Qiang Ye
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In the following experiments, we compare the sco RNN against LSTM and several other orthogonal and unitary RNN models. Figure 1 compares each model s performance for T = 1000 and T = 2000, with the baseline cross-entropy given as a dashed line. |
| Researcher Affiliation | Academia | 1Department of Mathematics, University of Kentucky, Lexington, Kentucky, USA. Correspondence to: Kyle Helfrich <kyle.helfrich@uky.edu>, Devin Willmott <devin.willmott@uky.edu>. |
| Pseudocode | No | The paper describes the update scheme using mathematical equations and textual descriptions but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Code for these experiments is available at https://github.com/Spartin Stuff/sco RNN. |
| Open Datasets | Yes | We ran two experiments based around classifying samples from the well-known MNIST dataset (Le Cun et al.). To see how the models performed on audio data, speech prediction was performed on the TIMIT dataset (Garofolo et al., 1993), a collection of real-world speech recordings. |
| Dataset Splits | No | The paper mentions training and testing sets, and in the TIMIT experiment, it mentions 'validation and testing sets', but it does not consistently provide explicit details on the size or percentage of the validation split for all experiments. |
| Hardware Specification | Yes | All models were run on the same machine, which has an Intel Core i5-7400 processor and an n Vidia Ge Force GTX 1080 GPU. |
| Software Dependencies | No | The paper mentions using 'Tensorflow' but does not specify a version number for it or any other software dependencies. |
| Experiment Setup | Yes | For each experiment, we found optimal hyperparameters for sco RNN using a grid search...Input and output weights used a learning rate of 10^-3, while the recurrent parameters used a learning rate of 10^-4 (for n = 170) or 10^-5 (for n = 360 and n = 512). For sco RNN, we used the Adam optimizer with learning rate 10^-3 to train input and output parameters, and RMSprop with a learning rate of 10^-3 (for n = 224) or 10^-4 (for n = 322, 425) to train the recurrent weight matrix. The number of negative eigenvalues used was ρ = n/10. The LSTM forget gate bias was initialized to -4. |