Data-Efficient Reinforcement Learning with Self-Predictive Representations

Authors: Max Schwarzer, Ankesh Anand, Rishab Goel, R Devon Hjelm, Aaron Courville, Philip Bachman

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our method, which we call Self-Predictive Representations (SPR), on the 26 games in the Atari 100k benchmark (Kaiser et al., 2019)... In our experiments, we augment a modified version of Data-Efficient Rainbow (DER) (van Hasselt et al., 2019) with the SPR loss, and evaluate versions of SPR with and without data augmentation.
Researcher Affiliation Collaboration Max Schwarzer Mila, Université de Montréal Ankesh Anand Mila, Université de Montréal Microsoft Research Rishab Goel Mila R Devon Hjelm Microsoft Research Mila, Université de Montréal Aaron Courville Mila, Université de Montréal CIFAR Fellow Philip Bachman Microsoft Research
Pseudocode Yes Algorithm 1: Self-Predictive Representations
Open Source Code Yes We’ve made the code associated with this work available at https://github.com/mila-iqia/spr.
Open Datasets Yes We evaluate our method, which we call Self-Predictive Representations (SPR), on the 26 games in the Atari 100k benchmark (Kaiser et al., 2019), where agents are allowed only 100k steps of environment interaction (producing 400k frames of input) per game
Dataset Splits No The paper describes interaction steps for training but does not specify a static validation dataset split (e.g., percentages or sample counts for a validation set) as typically seen in supervised learning.
Hardware Specification Yes We report wall-clock runtimes for a selection of methods in Table 8. SPR with augmentation for a 100K steps on Atari takes around 4 and a half to finish a complete training and evaluation run on a single game. We find that using data augmentation adds an overhead, and SPR without augmentation can run in just 3 hours. SPR s wall-clock run-time compares very favorably to previous works such as Sim PLe (Kaiser et al., 2019), which requires roughly three weeks to train on a GPU comparable to those used for SPR. Table 8: Wall-clock runtimes for various algorithms for a complete training and evaluation run on a single Atari game using a P100 GPU.
Software Dependencies No Our implementation uses rlpyt (Stooke & Abbeel, 2019) and Py Torch (Paszke et al., 2019). We use Kornia (Riba et al., 2020) for efficient GPU-based data augmentations. The paper mentions these software packages but does not provide specific version numbers for them.
Experiment Setup Yes Table 3: Hyperparameters for SPR on Atari, with and without augmentation. Parameter Setting (for both variations) ... Minibatch size 32 Optimizer Adam Optimizer: learning rate 0.0001 Max gradient norm 10 Training steps 100K ... λ (SPR loss coefficient 2 K (Prediction Depth) 5