Evaluating Long-Term Memory in 3D Mazes
Authors: Jurgis PaĊĦukonis, Timothy P Lillicrap, Danijar Hafner
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We introduce the Memory Maze, a 3D domain of randomized mazes specifically designed for evaluating long-term memory in agents. Unlike existing benchmarks, Memory Maze measures long-term memory separate from confounding agent abilities and requires the agent to localize itself by integrating information over time. With Memory Maze, we propose an online reinforcement learning benchmark, a diverse offline dataset, and an offline probing evaluation. Recording a human player establishes a strong baseline and verifies the need to build up and retain memories, which is reflected in their gradually increasing rewards within each episode. We find that current algorithms benefit from training with truncated backpropagation through time and succeed on small mazes, but fall short of human performance on the large mazes, leaving room for future algorithmic designs to be evaluated on the Memory Maze. Videos are available on the website: https://github.com/jurgisp/memory-maze |
| Researcher Affiliation | Collaboration | Jurgis Pasukonis Deep Mind Verses Research Lab Timothy Lillicrap Deep Mind University College London Danijar Hafner Deep Mind University of Toronto |
| Pseudocode | No | The paper does not contain any pseudocode or algorithm blocks. |
| Open Source Code | Yes | We open source the environment and make it easy to install and use. ... The environment can be installed as a pip package memory-maze or from the source code, available on the project website 1. 1https://github.com/jurgisp/memory-maze |
| Open Datasets | No | The paper states: 'We collect a diverse offline dataset of recorded experience from the Memory Maze environments... We release two datasets: Memory Maze 9x9 (30M) and Memory Maze 15x15 (30M).' However, it does not provide a direct link, DOI, or specific instructions for accessing these datasets beyond mentioning their release. The provided GitHub link is for the environment, not explicitly the datasets themselves. |
| Dataset Splits | Yes | The datasets are split into 29k trajectories for training and 1k for evaluation. ... We introduce the following four Memory Maze offline probing benchmarks: Memory 9x9 Walls, Memory 15x15 Walls, Memory 9x9 Objects, and Memory 15x15 Objects. These are based on either using the maze wall layout (maze_layout) or agent-centric object locations (targets_vec) as the probe prediction target, trained and evaluated on either Memory Maze 9x9 (30M) or Memory Maze 15x15 (30M) offline datasets. |
| Hardware Specification | No | The paper mentions: 'A single Dreamer training run took 14 days to train using one GPU learner and 8 CPU actors. A single IMPALA training run took 20 hours to train using one GPU learner and 128 CPU actors.' However, it does not specify the models or exact specifications of the GPUs or CPUs used, nor does it provide other hardware details. |
| Software Dependencies | No | The paper states: 'Memory Maze is implemented using Mu Jo Co (Todorov et al., 2012) as the physics and graphics engine and the dm_control (Tunyasuvunakool et al., 2020) library for building RL environments.' However, specific version numbers for MuJoCo or dm_control are not provided, which are necessary for full reproducibility. |
| Experiment Setup | Yes | For tuning hyperparameters we performed one-dimensional grid searches for one parameter at a time. We evaluated parameters on the Memory 11x11 environment, since it is the smallest challenging one, and then used the best values across all environments. For Dreamer agent we scanned over the following parameter values: recurrent state size (512, 1024, 2048, 4096), KL scale (0.1, 0.3, 1.0, 3.0), entropy scale (3e-4, 1e-3, 3e-3, 1e-2). For IMPALA we tuned the entropy scale (3e-4, 1e-3, 3e-3, 1e-2), learning rate (1e-4, 2e-4, 3e-4, 4e-4) and Adam epsilon (3e-9, 1e-7, 1e-6, 1e-5). The full hyperparameters are listed in Table E.1. ... Table E.1: Hyperparameters used when training Dreamer and Dreamer (TBTT) agents. ... Table E.2: Hyperparameters used when training IMPALA agent. ... Table E.3: Hyperparameters of the probe decoder used in the offline probing benchmarks. ... Table E.4: Hyperparameters used for the GRU+VAE offline probing baseline. |