Practical Real Time Recurrent Learning with a Sparse Approximation

Authors: Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We carry out experiments on both real-world and synthetic tasks, and demonstrate that the Sn Ap approximation: (1) works well for language modeling compared to the exact unapproximated gradient; (2) admits learning temporal dependencies on a synthetic copy task and (3) can learn faster than BPTT when updates are done online. 5 Experiments We include experimental results on the real world language-modelling task Wiki Text103 (Merity et al., 2017) and the synthetic Copy task (Graves et al., 2016)...
Researcher Affiliation Collaboration Jacob Menick Deep Mind University College London jmenick@google.com Erich Elsen Deep Mind eriche@google.com Utku Evci Google Simon Osindero Deep Mind Karen Simonyan Deep Mind Alex Graves Deep Mind
Pseudocode Yes Appendix D: Code Snippet for Sn Ap-1
Open Source Code No The paper provides a code snippet in Appendix D, but it does not include an explicit statement about open-sourcing the code or a link to a repository for the described methodology.
Open Datasets Yes We include experimental results on the real world language-modelling task Wiki Text103 (Merity et al., 2017) and the synthetic Copy task (Graves et al., 2016)
Dataset Splits No The paper mentions training on 'randomly cropped sequences of length 128' and reporting 'Results are reported on the standard validation set' for WikiText103. For the Copy task, it describes a 'curriculum-learning approach' and sampling sequence lengths. However, it does not provide specific numerical percentages or counts for training, validation, or test splits.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or cloud computing resources used for the experiments.
Software Dependencies No The paper mentions the use of Jax ('implemented in Jax (Bradbury et al., 2018)') and imports `jax.numpy` in the code snippet, but it does not specify version numbers for Jax, Python, or any other software dependencies.
Experiment Setup Yes All of our Wiki Text103 experiments... use the Adam optimizer (Kingma & Ba, 2014) with default hyperparameters β1 = 0.9, β2 = 0.999, and ϵ = 1e 8. We train on randomly cropped sequences of length 128... For each configuration we sweep over learning rates in {10 2.5, 10 3, 10 3.5, 10 4} and compare average performance over three seeds... The minibatch size was 16.