reproducibilityindex.ai

Orthogonal Recurrent Neural Networks with Scaled Cayley Transform

Authors: Kyle Helfrich, Devin Willmott, Qiang Ye

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	In the following experiments, we compare the sco RNN against LSTM and several other orthogonal and unitary RNN models. Figure 1 compares each model s performance for T = 1000 and T = 2000, with the baseline cross-entropy given as a dashed line.
Researcher Affiliation	Academia	1Department of Mathematics, University of Kentucky, Lexington, Kentucky, USA. Correspondence to: Kyle Helfrich <kyle.helfrich@uky.edu>, Devin Willmott <devin.willmott@uky.edu>.
Pseudocode	No	The paper describes the update scheme using mathematical equations and textual descriptions but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	Code for these experiments is available at https://github.com/Spartin Stuff/sco RNN.
Open Datasets	Yes	We ran two experiments based around classifying samples from the well-known MNIST dataset (Le Cun et al.). To see how the models performed on audio data, speech prediction was performed on the TIMIT dataset (Garofolo et al., 1993), a collection of real-world speech recordings.
Dataset Splits	No	The paper mentions training and testing sets, and in the TIMIT experiment, it mentions 'validation and testing sets', but it does not consistently provide explicit details on the size or percentage of the validation split for all experiments.
Hardware Specification	Yes	All models were run on the same machine, which has an Intel Core i5-7400 processor and an n Vidia Ge Force GTX 1080 GPU.
Software Dependencies	No	The paper mentions using 'Tensorﬂow' but does not specify a version number for it or any other software dependencies.
Experiment Setup	Yes	For each experiment, we found optimal hyperparameters for sco RNN using a grid search...Input and output weights used a learning rate of 10^-3, while the recurrent parameters used a learning rate of 10^-4 (for n = 170) or 10^-5 (for n = 360 and n = 512). For sco RNN, we used the Adam optimizer with learning rate 10^-3 to train input and output parameters, and RMSprop with a learning rate of 10^-3 (for n = 224) or 10^-4 (for n = 322, 425) to train the recurrent weight matrix. The number of negative eigenvalues used was ρ = n/10. The LSTM forget gate bias was initialized to -4.