Practical Real Time Recurrent Learning with a Sparse Approximation
Authors: Jacob Menick, Erich Elsen, Utku Evci, Simon Osindero, Karen Simonyan, Alex Graves
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We carry out experiments on both real-world and synthetic tasks, and demonstrate that the Sn Ap approximation: (1) works well for language modeling compared to the exact unapproximated gradient; (2) admits learning temporal dependencies on a synthetic copy task and (3) can learn faster than BPTT when updates are done online. 5 Experiments We include experimental results on the real world language-modelling task Wiki Text103 (Merity et al., 2017) and the synthetic Copy task (Graves et al., 2016)... |
| Researcher Affiliation | Collaboration | Jacob Menick Deep Mind University College London jmenick@google.com Erich Elsen Deep Mind eriche@google.com Utku Evci Google Simon Osindero Deep Mind Karen Simonyan Deep Mind Alex Graves Deep Mind |
| Pseudocode | Yes | Appendix D: Code Snippet for Sn Ap-1 |
| Open Source Code | No | The paper provides a code snippet in Appendix D, but it does not include an explicit statement about open-sourcing the code or a link to a repository for the described methodology. |
| Open Datasets | Yes | We include experimental results on the real world language-modelling task Wiki Text103 (Merity et al., 2017) and the synthetic Copy task (Graves et al., 2016) |
| Dataset Splits | No | The paper mentions training on 'randomly cropped sequences of length 128' and reporting 'Results are reported on the standard validation set' for WikiText103. For the Copy task, it describes a 'curriculum-learning approach' and sampling sequence lengths. However, it does not provide specific numerical percentages or counts for training, validation, or test splits. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU or CPU models, memory specifications, or cloud computing resources used for the experiments. |
| Software Dependencies | No | The paper mentions the use of Jax ('implemented in Jax (Bradbury et al., 2018)') and imports `jax.numpy` in the code snippet, but it does not specify version numbers for Jax, Python, or any other software dependencies. |
| Experiment Setup | Yes | All of our Wiki Text103 experiments... use the Adam optimizer (Kingma & Ba, 2014) with default hyperparameters β1 = 0.9, β2 = 0.999, and ϵ = 1e 8. We train on randomly cropped sequences of length 128... For each configuration we sweep over learning rates in {10 2.5, 10 3, 10 3.5, 10 4} and compare average performance over three seeds... The minibatch size was 16. |