reproducibilityindex.ai

Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections

Authors: Zakaria Mhammedi, Andrew Hellicar, Ashfaqur Rahman, James Bailey

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our results show that the orthogonal constraint on the transition matrix applied through our parametrisation gives similar beneﬁts to the unitary constraint, without the time complexity limitations.5. Experiments
Researcher Affiliation	Academia	1The University of Melbourne, Parkville, Australia 2Data61, CSIRO, Australia.
Pseudocode	Yes	Algorithm 1 Local forward and backward propagations at time step t.
Open Source Code	Yes	Our implementation can be found at https://github. com/zmhammedi/Orthogonal_RNN.
Open Datasets	Yes	We used the MNIST image dataset.We tested the o RNN on the task of character level prediction using the Penn Tree Bank Corpus.
Dataset Splits	Yes	We split the dataset into training (55000 instances), validation (5000 instances), and test sets (10000 instances).
Hardware Specification	No	The paper does not provide specific details about the hardware used for the experiments (e.g., CPU, GPU models, memory, or cluster specifications).
Software Dependencies	No	All RNN models were implemented using the python library theano (Theano Development Team, 2016). We implemented the one-step FP and BP algorithms described in Algorithm 1 using C code. The paper mentions the use of Theano and C code but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup	Yes	All RNN models were implemented using the python library theano (Theano Development Team, 2016). We set its activation function to the leaky_Re LU deﬁned as φ(x) = max( x 10, x). For all experiments, we used the adam method for stochastic gradient descent (Kingma & Ba, 2014). We initialised all the parameters using uniform distributions similar to (Arjovsky et al., 2016). The biases of all models were set to zero, except for the forget bias of the LSTM, which we set to 5 to facilitate the learning of long-term dependencies (Koutn ık et al., 2014). All the learning rates were set to 10 3. We chose a batch size of 50. We experimented with (mini-batch size, learning rate) {(1, 10 4), (50, 10 3)}. The learning rate was set to 0.0001 for both models with a mini-batch size of 1.