Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Fast Parametric Learning with Activation Memorization
Authors: Jack Rae, Chris Dyer, Peter Dayan, Timothy Lillicrap
ICML 2018 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate this model adapts quickly to novel classes in a simple image classification task using handwritten characters from Omniglot (Lake et al., 2015). We then show it improves overall test perplexity for two medium-scale language modelling corpora, Wiki Text103 (wikipedia articles) from Merity et al. (2016) and Project Gutenberg (books), alongside a large-scale corpus Giga Word v5 (news articles) from Parker et al. (2011). By splitting accuracy over word frequency buckets, we see improved perplexity for less frequent words. |
| Researcher Affiliation | Collaboration | 1Deep Mind, London, UK 2Co MPLEX, Computer Science, University College London, London, UK 3Gatsby Computational Neuroscience Unit, University College London, UK. |
| Pseudocode | Yes | Algorithm 1 Hebbian Softmax batched update |
| Open Source Code | No | The paper does not provide an explicit statement or link to open-source code for the methodology described. |
| Open Datasets | Yes | Omniglot data (Lake et al., 2015), Wiki Text-103 (wikipedia articles) from Merity et al. (2016) and Project Gutenberg (books), alongside a large-scale corpus Giga Word v5 (news articles) from Parker et al. (2011). |
| Dataset Splits | Yes | We partition the first 5 examples per class to a test set, and assign the rest for training." and "2017 training books (175, 181, 505 tokens), 12 validation books (609, 545 tokens), and 13 test books (526, 646 tokens) |
| Hardware Specification | Yes | 6 days of training with 8 P100s training synchronously. |
| Software Dependencies | No | The paper mentions optimizers (RMSProp, Adam, Ada Grad) but does not provide specific version numbers for software dependencies or libraries used. |
| Experiment Setup | Yes | Models were trained with 20% dropout on the final layer and a small amount of data augmentation was applied to training examples (rotation 2 [ 30, 30], translation) to avoid overfitting." and "Hyper-parameters and further training details are described in Appendix A.1." (Appendix A.1 mentions: "The LSTM language models are trained with a learning rate of 0.2 using Adam optimizer with β1 = 0, β2 = 0.999." "We used a batch size of 64.") |