Kronecker Recurrent Units
Authors: Cijo Jose, Moustapha Cisse, Francois Fleuret
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental results on seven standard data-sets reveal that KRU can reduce the number of parameters by three orders of magnitude in the recurrent weight matrix compared to the existing recurrent models, without trading the statistical performance. |
| Researcher Affiliation | Collaboration | 1Idiap Research Institute 2 Ecole Polytechnique F ed erale de Lausanne (EPFL) 3Facebook AI Research. |
| Pseudocode | No | The paper describes the model mathematically but does not include a pseudocode block or a clearly labeled algorithm. |
| Open Source Code | No | We will release our library to reproduce all the results which we report in this paper. |
| Open Datasets | Yes | Our experimental results on seven standard data-sets... Copy memory problem (Hochreiter & Schmidhuber, 1997)... Adding problem (Hochreiter & Schmidhuber, 1997)... Pixel by Pixel MNIST... Penn Tree Bank data-set (Marcus et al., 1993)... JSB Chorales and Piano-midi... TIMIT data-set (Garofolo et al., 1993). |
| Dataset Splits | Yes | The size of the MNIST training set is 60K among which we choose 5K as the validation set. The models are trained on the remaining 55K points. ... Penn Tree Bank is composed of 5017K characters in the training set, 393K characters in the validation set and 442K characters in the test set. ... TIMIT contains a training set of 3696 utterances among which we use 184 as the validation set. |
| Hardware Specification | No | The paper mentions "modern BLAS libraries" and "custom CUDA kernels" but does not specify any particular hardware components like CPU/GPU models, memory, or specific computing platforms used for the experiments. |
| Software Dependencies | No | Existing deep learning libraries such as Theano (Bergstra et al., 2011), Tensorflow (Abadi et al., 2016) and Pytorch (Paszke et al., 2017) do not support fast primitives for Kronecker products with arbitrary number of factors. So we wrote custom CUDA kernels for Kronecker forward and backward operations. All our models are implemented in C++. |
| Experiment Setup | Yes | All the models were trained using RMSprop with a learning rate of 1e 3, decay of 0.9 and a batch size of 20. ... All the models were trained using RMSprop with a learning rate of 1e 3 and a batch size of 20 or 50... All our models were trained for 50 epochs with a batch size of 50 and using ADAM (Kingma & Ba, 2014). We use a learning rate of 1e 3... Back-propagation through time (BPTT) is unrolled for 30 time frames... |