Been There, Done That: Meta-Learning with Episodic Recall
Authors: Samuel Ritter, Jane Wang, Zeb Kurth-Nelson, Siddhant Jayakumar, Charles Blundell, Razvan Pascanu, Matthew Botvinick
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We tested the capabilities of L2RL agents equipped with ep LSTM ( ep L2RL agents ) in five experiments. Experiments 1-3 use multi-armed bandits, first exploring the basic case where tasks reoccur in their entirety and are identified by exactly reoccurring contexts (Exp. 1), then, the more difficult challenge wherein contexts are drawn from Omniglot categories and vary in appearance with each reoccurrence (Exp. 2), and then, the more complex scenario where task components reoccur in arbitrary combinations (Exp. 3). Experiment 4 uses a water maze navigation task to assess ep L2RL s ability to handle multi-state MDPs, and Experiment 5 uses a task from the neuroscience literature to examine the learning algorithms ep L2RL learns to execute. |
| Researcher Affiliation | Collaboration | 1Deep Mind, London, UK 2Princeton Neuroscience Institute, Princeton, NJ 3MPS-UCL Centre for Computational Psychiatry, London, UK 4Gatsby Computational Neuroscience Unit, UCL, London, UK. |
| Pseudocode | No | The paper describes the model architecture in text and through a diagram (Figure 2), but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The acknowledgements section mentions the use of an asynchronous RL codebase and a DND library, but there is no explicit statement or link provided indicating that the source code for the methodology described in this paper is publicly available. |
| Open Datasets | Yes | We used pretrained Omniglot embeddings from Kaiser et al. (2017). This is a particularly appropriate method for pretraining because such a contrastive loss optimization procedure (Hadsell et al., 2006) could be run online over the DND s contents, assuming some heuristic for determining neighbor status. |
| Dataset Splits | No | The paper describes training and evaluation episodes, and mentions that weights were frozen during evaluation, but it does not specify explicit dataset splits (e.g., percentages or counts for training, validation, and test sets). |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments. |
| Software Dependencies | No | The paper mentions the use of 'Tensorflow and Torch predecessors' and a 'DND library' in the acknowledgements, but it does not provide specific version numbers for these or any other software dependencies, making replication challenging. |
| Experiment Setup | No | The paper states that 'Hyperparameters were tuned for the basic L2RL model, and held fixed for the other model variations,' but it does not provide specific values for these hyperparameters (e.g., learning rate, batch size, optimizer settings) in the main text. |