reproducibilityindex.ai

Data-Efficient Reinforcement Learning with Self-Predictive Representations

Authors: Max Schwarzer, Ankesh Anand, Rishab Goel, R Devon Hjelm, Aaron Courville, Philip Bachman

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method, which we call Self-Predictive Representations (SPR), on the 26 games in the Atari 100k benchmark (Kaiser et al., 2019)... In our experiments, we augment a modiﬁed version of Data-Efﬁcient Rainbow (DER) (van Hasselt et al., 2019) with the SPR loss, and evaluate versions of SPR with and without data augmentation.
Researcher Affiliation	Collaboration	Max Schwarzer Mila, Université de Montréal Ankesh Anand Mila, Université de Montréal Microsoft Research Rishab Goel Mila R Devon Hjelm Microsoft Research Mila, Université de Montréal Aaron Courville Mila, Université de Montréal CIFAR Fellow Philip Bachman Microsoft Research
Pseudocode	Yes	Algorithm 1: Self-Predictive Representations
Open Source Code	Yes	We’ve made the code associated with this work available at https://github.com/mila-iqia/spr.
Open Datasets	Yes	We evaluate our method, which we call Self-Predictive Representations (SPR), on the 26 games in the Atari 100k benchmark (Kaiser et al., 2019), where agents are allowed only 100k steps of environment interaction (producing 400k frames of input) per game
Dataset Splits	No	The paper describes interaction steps for training but does not specify a static validation dataset split (e.g., percentages or sample counts for a validation set) as typically seen in supervised learning.
Hardware Specification	Yes	We report wall-clock runtimes for a selection of methods in Table 8. SPR with augmentation for a 100K steps on Atari takes around 4 and a half to ﬁnish a complete training and evaluation run on a single game. We ﬁnd that using data augmentation adds an overhead, and SPR without augmentation can run in just 3 hours. SPR s wall-clock run-time compares very favorably to previous works such as Sim PLe (Kaiser et al., 2019), which requires roughly three weeks to train on a GPU comparable to those used for SPR. Table 8: Wall-clock runtimes for various algorithms for a complete training and evaluation run on a single Atari game using a P100 GPU.
Software Dependencies	No	Our implementation uses rlpyt (Stooke & Abbeel, 2019) and Py Torch (Paszke et al., 2019). We use Kornia (Riba et al., 2020) for efﬁcient GPU-based data augmentations. The paper mentions these software packages but does not provide specific version numbers for them.
Experiment Setup	Yes	Table 3: Hyperparameters for SPR on Atari, with and without augmentation. Parameter Setting (for both variations) ... Minibatch size 32 Optimizer Adam Optimizer: learning rate 0.0001 Max gradient norm 10 Training steps 100K ... λ (SPR loss coefﬁcient 2 K (Prediction Depth) 5