Mastering Memory Tasks with World Models
Authors: Mohammad Reza Samsami, Artem Zholus, Janarthanan Rajendran, Sarath Chandar
ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Through a diverse set of illustrative tasks, we systematically demonstrate that R2I not only establishes a new state-of-the-art for challenging memory and credit assignment RL tasks, such as BSuite and POPGym, but also showcases superhuman performance in the complex memory domain of Memory Maze. At the same time, it upholds comparable performance in classic RL tasks, such as Atari and DMC, suggesting the generality of our method. |
| Researcher Affiliation | Academia | Mohammad Reza Samsami 1,2 Artem Zholus 1,3 Janarthanan Rajendran1,2 Sarath Chandar1,3,4 1Mila Quebec AI Institute 2Universit e de Montr eal 3Polytechnique Montr eal 4CIFAR AI Chair |
| Pseudocode | Yes | Algorithm 1: Recall to Imagine (R2I), full state policy training |
| Open Source Code | Yes | See our website here: recall2imagine.github.io |
| Open Datasets | Yes | We cover five RL domains: BSuite (Osband et al., 2020), POPGym (Morad et al., 2023), Atari 100K (Łukasz Kaiser et al., 2020), DMC Tassa et al. (2018), and Memory Maze (Pasukonis et al., 2022). |
| Dataset Splits | No | The paper mentions training on various RL environments (BSuite, POPGym, Atari, DMC, Memory Maze) using a FIFO replay buffer and specific batch sizes and lengths. However, it does not provide explicit training/validation/test dataset splits in terms of percentages or sample counts for the data collected from these environments. |
| Hardware Specification | Yes | The system achieves a total throughput of approximately 350 frames per second (FPS), leveraging two NVIDIA A100 GPUs with 40GB of memory, along with 40 environment workers. |
| Software Dependencies | No | The paper mentions 'JAX implementation' in the acknowledgements and details about the model's architecture and hyperparameters (Table 3), but it does not specify version numbers for Python, deep learning frameworks (like PyTorch or TensorFlow), or other software libraries used. |
| Experiment Setup | Yes | Name Value FIFO replay buffer size 10^7 Batch length, L 1024 Batch size 4 Nonlinearity Layer Norm + Si LU SSM discretization method bilinear SSM nonlinearity Ge LU + GLU + Layer Norm SSM matrices parameterization Diagonal SSM dimensionality parameterization MIMO SSM matrix blocks number (Hi PPOs number) 8 SSM discretization range (10^-3, 10^-1) Latent variable Multi-categorical Categorical latent variable numer 32 Categorical classes number 32 Unimix probability 0.01 Learning rate 10^-4 Reconstruction loss weight, βpred 1 Dynamics loss weight, βpred 0.5 Representation loss weight, βrep 0.1 World Model gradient clipping 1000 Adam epsilon 10^-8 Actor Critic Hyperparameters Imagination horizon 15 Discount γ 0.997 Return λ 0.95 Entropy weight 3 10^-4 Critic EMA decay 0.98 Critic EMA regularizer 1 Return normalization scale Per(R, 95) Per(R, 5) Return normalization decay 0.99 Adam epsilon 10^-5 Actor-Critic gradient clipping 100 |