Kernel RNN Learning (KeRNL)

Authors: Christopher Roth, Ingmar Kanitscheider, Ila Fiete

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test Ke RNL on several benchmark tasks that require memory and computation over time, showing that it is competitive with BPTT across these tasks. We implemented batch learning with Ke RNL and BPTT on two tasks: the adding problem ( Hochreiter & Schmidhuber (1997); Hochreiter et al. (2001)) and pixel-by-pixel MNIST (Le Cun et al. (1998)). We implemented an online version of Ke RNL with an LSTM network on the An, Bn task (Gruslys et al. (2016)) to compare with results from the UORO algorithm (Tallec & Ollivier (2017)).
Researcher Affiliation Collaboration Christopher Roth[1],[2], Ingmar Kanitscheider[2],[3], and Ila Fiete[2],[4] [1] Department of Physics, University of Texas at Austin, Austin, TX, 78712 [2] Department of Neuroscience, University of Texas at Austin, Austin, TX, 78712 [3] Open AI, San Francisco CA, 94110 [4] Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA, 02139
Pseudocode Yes Algorithm 1 Pseudocode table describing the implementation of Online-Ke RNL on an RNN. For Batched-Ke RNL we only update parameters when t = T
Open Source Code No The paper does not provide explicit access to its source code, either through a repository link or a clear statement of code release. It mentions using the Python numpy package, but this is a third-party library.
Open Datasets Yes We implemented batch learning with Ke RNL and BPTT on two tasks: the adding problem ( Hochreiter & Schmidhuber (1997); Hochreiter et al. (2001)) and pixel-by-pixel MNIST (Le Cun et al. (1998)). ... We implemented an online version of Ke RNL with an LSTM network on the An, Bn task (Gruslys et al. (2016)).
Dataset Splits No The paper refers to "cross validation loss" and uses phrases like "tuned hyperparameters for..." and "performed best on the task with sequence length 400", implying some form of validation was done. However, it does not provide specific details on dataset splits (e.g., percentages, sample counts for train/validation/test sets) needed for reproduction.
Hardware Specification No The paper does not specify any particular hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications).
Software Dependencies No The paper mentions using the "Python numpy (Walt et al. (2011)) package" but does not provide specific version numbers for Python, numpy, or any other software dependencies needed for replication.
Experiment Setup Yes The tuned hyperparameters for BPTT and Ke RNL were the learning rate, η, and the gradient clipping parameter, gc (Pascanu et al. (2013)). For Ke RNL, we additionally permitted a shared learning rate parameter for the sensitivity weights and kernels, ηm. ... Table 1: Tuned hyperparameters for the adding problem with sequence length 400. Learning Rule, Network Algorithm η gc ηm BPTT, tanh RNN RMS Prop 10 3 100.0 ... Table 3: Hyperparameters for pixel-by-pixel MNIST. ... Table 4: An, Bn hyperparameters Algorithm-Optimizer η ηm α Ke RNL-Adam 10 3 10 2 0.03