Generalization of Reinforcement Learners with Working and Episodic Memory

Authors: Meire Fortunato, Melissa Tan, Ryan Faulkner, Steven Hansen, Adrià Puigdomènech Badia, Gavin Buttimore, Charles Deck, Joel Z. Leibo, Charles Blundell

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this paper, we aim to develop a comprehensive methodology to test different kinds of memory in an agent and assess how well the agent can apply what it learns in training to a holdout set that differs from the training set along dimensions that we suggest are relevant for evaluating memory-specific generalization. To that end, we first construct a diverse set of memory tasks1 that allow us to evaluate test-time generalization across multiple dimensions. Second, we develop and perform multiple ablations on an agent architecture that combines multiple memory systems, observe its baseline models, and investigate its performance against the task suite.
Researcher Affiliation Industry {meirefortunato, melissatan, rfaulk, stevenhansen, adriap, buttimore, cdeck, jzl, cblundell}@google.com
Pseudocode No The paper describes the MRA architecture and components with textual explanations and mathematical formulas (e.g., equations 1, 2, 4, 5) but does not include any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper provides a link to the task suite (https://github.com/deepmind/dm_memorytasks) which provides environments for testing, but it does not explicitly state that the source code for their proposed MRA methodology (agent architecture and implementation) is available at this link or elsewhere.
Open Datasets Yes We define a suite of 13 tasks designed to test different aspects of memory, with train-test splits that test for generalization across multiple dimensions (https://github.com/deepmind/dm _memorytasks). These include cognitive psychology tasks adapted from Psych Lab (Leibo et al., 2018) and DMLab (Beattie et al., 2016), and new tasks built with the Unity 3D game engine (uni) that require the agent to 1) spot the difference between two scenes; 2) remember the location of a goal and navigate to it; or 3) infer an indirect transitive relation between objects.
Dataset Splits Yes The training level comprises a small and large scale version of the task. When training the agent we uniformly sample between these two scales. As for the holdout levels, one of them holdout-interpolate corresponds to an interpolation between those two scales (call it medium ) and the other, holdout-extrapolate , corresponds to an extrapolation beyond the large scale (call it extra-large ).
Hardware Specification No The paper mentions that 'the experiments are computationally demanding' but provides no specific details about the hardware used, such as GPU models, CPU types, or memory.
Software Dependencies No The paper mentions using specific environments like Psych Lab, DMLab, and Unity 3D, and refers to external libraries or models like IMPALA and CPC, but it does not provide specific version numbers for any software dependencies.
Experiment Setup No The paper states 'The precise hyper-parameters are given in C.2' and 'we only performed small variations within as part of our hyper-parameter tuning process for each task (see App. D)'. While these indicate hyperparameters exist, the specific details are relegated to appendices and not provided within the main text of the paper.