reproducibilityindex.ai

Approximating Real-Time Recurrent Learning with Random Kronecker Factors

Authors: Asier Mujika, Florian Meier, Angelika Steger

NeurIPS 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We also conﬁrm these theoretical results experimentally. Further, we show empirically that the KF-RTRL algorithm captures long-term dependencies and almost matches the performance of TBPTT on real world tasks by training Recurrent Highway Networks on a synthetic string memorization task and on the Penn Tree Bank task, respectively.
Researcher Affiliation	Academia	Asier Mujika Department of Computer Science ETH Zürich, Switzerland asierm@inf.ethz.ch Florian Meier Department of Computer Science ETH Zürich, Switzerland meierflo@inf.ethz.ch Angelika Steger Department of Computer Science ETH Zürich, Switzerland steger@inf.ethz.ch
Pseudocode	Yes	The detailed algorithmic steps of KF-RTRL are presented in Algorithm 1 and motivated below. Algorithm 1 One step of KF-RTRL (from time t 1 to t)
Open Source Code	No	The paper does not provide any explicit statement about making the source code available or include a link to a code repository.
Open Datasets	Yes	For this experiment we use the Penn Tree Bank [10] dataset, which is a collection of Wall Street Journal articles.
Dataset Splits	Yes	We split the data following Mikolov et al. [13]. Figure 2: Validation performance on Penn Tree Bank in bits per character (BPC). Table 1: Results on Penn Tree Bank. Merity et al. [12] is currently the state of the art (trained with TBPTT). For simplicity we do not report standard deviations, as all of them are smaller than 0.03.
Hardware Specification	No	The paper does not provide specific hardware details (like GPU/CPU models or types of machines) used for running the experiments.
Software Dependencies	No	The paper mentions software like 'Tensorﬂow' and 'Adam optimizer' but does not specify any version numbers for these dependencies.
Experiment Setup	Yes	We use a RHN with 256 units and a batch size of 256. We optimize the log-likelihood using the Adam optimizer [7] with default Tensorﬂow [1] parameters, β1 = 0.9 and β2 = 0.999. For each model we pick the optimal learning rate from {10 2.5, 10 3, 10 3.5, 10 4}. We repeat each experiment 5 times. Apart from that, we reset the hidden state to the all zeros state with probability 0.01 at each time step.