Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections

Authors: Zakaria Mhammedi, Andrew Hellicar, Ashfaqur Rahman, James Bailey

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our results show that the orthogonal constraint on the transition matrix applied through our parametrisation gives similar benefits to the unitary constraint, without the time complexity limitations.5. Experiments
Researcher Affiliation Academia 1The University of Melbourne, Parkville, Australia 2Data61, CSIRO, Australia.
Pseudocode Yes Algorithm 1 Local forward and backward propagations at time step t.
Open Source Code Yes Our implementation can be found at https://github. com/zmhammedi/Orthogonal_RNN.
Open Datasets Yes We used the MNIST image dataset.We tested the o RNN on the task of character level prediction using the Penn Tree Bank Corpus.
Dataset Splits Yes We split the dataset into training (55000 instances), validation (5000 instances), and test sets (10000 instances).
Hardware Specification No The paper does not provide specific details about the hardware used for the experiments (e.g., CPU, GPU models, memory, or cluster specifications).
Software Dependencies No All RNN models were implemented using the python library theano (Theano Development Team, 2016). We implemented the one-step FP and BP algorithms described in Algorithm 1 using C code. The paper mentions the use of Theano and C code but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes All RNN models were implemented using the python library theano (Theano Development Team, 2016). We set its activation function to the leaky_Re LU defined as φ(x) = max( x 10, x). For all experiments, we used the adam method for stochastic gradient descent (Kingma & Ba, 2014). We initialised all the parameters using uniform distributions similar to (Arjovsky et al., 2016). The biases of all models were set to zero, except for the forget bias of the LSTM, which we set to 5 to facilitate the learning of long-term dependencies (Koutn ık et al., 2014). All the learning rates were set to 10 3. We chose a batch size of 50. We experimented with (mini-batch size, learning rate) {(1, 10 4), (50, 10 3)}. The learning rate was set to 0.0001 for both models with a mini-batch size of 1.