Kronecker Recurrent Units

Authors: Cijo Jose, Moustapha Cisse, Francois Fleuret

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results on seven standard data-sets reveal that KRU can reduce the number of parameters by three orders of magnitude in the recurrent weight matrix compared to the existing recurrent models, without trading the statistical performance.
Researcher Affiliation Collaboration 1Idiap Research Institute 2 Ecole Polytechnique F ed erale de Lausanne (EPFL) 3Facebook AI Research.
Pseudocode No The paper describes the model mathematically but does not include a pseudocode block or a clearly labeled algorithm.
Open Source Code No We will release our library to reproduce all the results which we report in this paper.
Open Datasets Yes Our experimental results on seven standard data-sets... Copy memory problem (Hochreiter & Schmidhuber, 1997)... Adding problem (Hochreiter & Schmidhuber, 1997)... Pixel by Pixel MNIST... Penn Tree Bank data-set (Marcus et al., 1993)... JSB Chorales and Piano-midi... TIMIT data-set (Garofolo et al., 1993).
Dataset Splits Yes The size of the MNIST training set is 60K among which we choose 5K as the validation set. The models are trained on the remaining 55K points. ... Penn Tree Bank is composed of 5017K characters in the training set, 393K characters in the validation set and 442K characters in the test set. ... TIMIT contains a training set of 3696 utterances among which we use 184 as the validation set.
Hardware Specification No The paper mentions "modern BLAS libraries" and "custom CUDA kernels" but does not specify any particular hardware components like CPU/GPU models, memory, or specific computing platforms used for the experiments.
Software Dependencies No Existing deep learning libraries such as Theano (Bergstra et al., 2011), Tensorflow (Abadi et al., 2016) and Pytorch (Paszke et al., 2017) do not support fast primitives for Kronecker products with arbitrary number of factors. So we wrote custom CUDA kernels for Kronecker forward and backward operations. All our models are implemented in C++.
Experiment Setup Yes All the models were trained using RMSprop with a learning rate of 1e 3, decay of 0.9 and a batch size of 20. ... All the models were trained using RMSprop with a learning rate of 1e 3 and a batch size of 20 or 50... All our models were trained for 50 epochs with a batch size of 50 and using ADAM (Kingma & Ba, 2014). We use a learning rate of 1e 3... Back-propagation through time (BPTT) is unrolled for 30 time frames...