Data-Efficient Reinforcement Learning with Self-Predictive Representations
Authors: Max Schwarzer, Ankesh Anand, Rishab Goel, R Devon Hjelm, Aaron Courville, Philip Bachman
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method, which we call Self-Predictive Representations (SPR), on the 26 games in the Atari 100k benchmark (Kaiser et al., 2019)... In our experiments, we augment a modified version of Data-Efficient Rainbow (DER) (van Hasselt et al., 2019) with the SPR loss, and evaluate versions of SPR with and without data augmentation. |
| Researcher Affiliation | Collaboration | Max Schwarzer Mila, Université de Montréal Ankesh Anand Mila, Université de Montréal Microsoft Research Rishab Goel Mila R Devon Hjelm Microsoft Research Mila, Université de Montréal Aaron Courville Mila, Université de Montréal CIFAR Fellow Philip Bachman Microsoft Research |
| Pseudocode | Yes | Algorithm 1: Self-Predictive Representations |
| Open Source Code | Yes | We’ve made the code associated with this work available at https://github.com/mila-iqia/spr. |
| Open Datasets | Yes | We evaluate our method, which we call Self-Predictive Representations (SPR), on the 26 games in the Atari 100k benchmark (Kaiser et al., 2019), where agents are allowed only 100k steps of environment interaction (producing 400k frames of input) per game |
| Dataset Splits | No | The paper describes interaction steps for training but does not specify a static validation dataset split (e.g., percentages or sample counts for a validation set) as typically seen in supervised learning. |
| Hardware Specification | Yes | We report wall-clock runtimes for a selection of methods in Table 8. SPR with augmentation for a 100K steps on Atari takes around 4 and a half to finish a complete training and evaluation run on a single game. We find that using data augmentation adds an overhead, and SPR without augmentation can run in just 3 hours. SPR s wall-clock run-time compares very favorably to previous works such as Sim PLe (Kaiser et al., 2019), which requires roughly three weeks to train on a GPU comparable to those used for SPR. Table 8: Wall-clock runtimes for various algorithms for a complete training and evaluation run on a single Atari game using a P100 GPU. |
| Software Dependencies | No | Our implementation uses rlpyt (Stooke & Abbeel, 2019) and Py Torch (Paszke et al., 2019). We use Kornia (Riba et al., 2020) for efficient GPU-based data augmentations. The paper mentions these software packages but does not provide specific version numbers for them. |
| Experiment Setup | Yes | Table 3: Hyperparameters for SPR on Atari, with and without augmentation. Parameter Setting (for both variations) ... Minibatch size 32 Optimizer Adam Optimizer: learning rate 0.0001 Max gradient norm 10 Training steps 100K ... λ (SPR loss coefficient 2 K (Prediction Depth) 5 |