reproducibilityindex.ai

Kronecker Recurrent Units

Authors: Cijo Jose, Moustapha Cisse, Francois Fleuret

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experimental results on seven standard data-sets reveal that KRU can reduce the number of parameters by three orders of magnitude in the recurrent weight matrix compared to the existing recurrent models, without trading the statistical performance.
Researcher Affiliation	Collaboration	1Idiap Research Institute 2 Ecole Polytechnique F ed erale de Lausanne (EPFL) 3Facebook AI Research.
Pseudocode	No	The paper describes the model mathematically but does not include a pseudocode block or a clearly labeled algorithm.
Open Source Code	No	We will release our library to reproduce all the results which we report in this paper.
Open Datasets	Yes	Our experimental results on seven standard data-sets... Copy memory problem (Hochreiter & Schmidhuber, 1997)... Adding problem (Hochreiter & Schmidhuber, 1997)... Pixel by Pixel MNIST... Penn Tree Bank data-set (Marcus et al., 1993)... JSB Chorales and Piano-midi... TIMIT data-set (Garofolo et al., 1993).
Dataset Splits	Yes	The size of the MNIST training set is 60K among which we choose 5K as the validation set. The models are trained on the remaining 55K points. ... Penn Tree Bank is composed of 5017K characters in the training set, 393K characters in the validation set and 442K characters in the test set. ... TIMIT contains a training set of 3696 utterances among which we use 184 as the validation set.
Hardware Specification	No	The paper mentions "modern BLAS libraries" and "custom CUDA kernels" but does not specify any particular hardware components like CPU/GPU models, memory, or specific computing platforms used for the experiments.
Software Dependencies	No	Existing deep learning libraries such as Theano (Bergstra et al., 2011), Tensorﬂow (Abadi et al., 2016) and Pytorch (Paszke et al., 2017) do not support fast primitives for Kronecker products with arbitrary number of factors. So we wrote custom CUDA kernels for Kronecker forward and backward operations. All our models are implemented in C++.
Experiment Setup	Yes	All the models were trained using RMSprop with a learning rate of 1e 3, decay of 0.9 and a batch size of 20. ... All the models were trained using RMSprop with a learning rate of 1e 3 and a batch size of 20 or 50... All our models were trained for 50 epochs with a batch size of 50 and using ADAM (Kingma & Ba, 2014). We use a learning rate of 1e 3... Back-propagation through time (BPTT) is unrolled for 30 time frames...