Retrieval-Augmented Reinforcement Learning

Authors: Anirudh Goyal, Abram Friesen, Andrea Banino, Theophane Weber, Nan Rosemary Ke, Adrià Puigdomènech Badia, Arthur Guez, Mehdi Mirza, Peter C Humphreys, Ksenia Konyushova, Michal Valko, Simon Osindero, Timothy Lillicrap, Nicolas Heess, Charles Blundell

ICML 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Summary of experimental results. We first show that the performance and sample efficiency of R2D2 (Kapturowski et al., 2018), a state-of-the-art off-policy RL algorithm, on Atari games can be improved by retrieval augmentation. In this setting, we run a series of ablations to demonstrate the benefits of our design decisions and to show how our approach compares with related work. In online Atari, the agent retrieves from its own experiences on the same game; however, retrieval can also query external data from other agents or other tasks. We thus evaluated on three separate multi-task offline RL environments (gridroboman, Baby AI (Chevalier-Boisvert et al., 2018), Causal World (Ahmed et al., 2020) (a continuous control benchmark), where the retrieved data is first from a different agent in the same task and then from different agents and includes data from other tasks. In all cases, the retrieval-augmented agent learns faster and achieves higher reward.
Researcher Affiliation Collaboration 1Mila, Université de Montréal 2Deep Mind. Correspondence to: Anirudh Goyal <anirudhgoyal9119@gmail.com>.
Pseudocode Yes Algorithm 1 One timestep of a retrieval-augmented agent (R2A).
Open Source Code No The paper does not provide an explicit statement or link to open-source code for the methodology described.
Open Datasets Yes We thus evaluated on three separate multi-task offline RL environments (gridroboman, Baby AI (Chevalier-Boisvert et al., 2018), Causal World (Ahmed et al., 2020) (a continuous control benchmark)
Dataset Splits No The paper does not explicitly state training/validation/test splits for the datasets in terms of proportions or sample counts. It describes training data amounts and replay buffer usage but not overall dataset partitioning.
Hardware Specification No The paper does not provide specific details about the hardware used for the experiments, such as GPU or CPU models.
Software Dependencies No The paper mentions software components like "Adam", "pycolab", "GRU", "ResNet", "DQN", "R2D2", "BERT", "VQ-VAE" but does not specify version numbers for any of these.
Experiment Setup Yes Appendix A.3 provides further details of the setup, training losses, and computational complexity. Table 3: Hyperparameters used in the Atari R2D2 experiments. Table 5: Hyperparameters used in the gridroboman DQN experiments. Table 6: Hyper-parameters used in the Baby AI Recurrent DQN experiments.