Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier
Authors: Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G Bellemare, Aaron Courville
ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We push the limits of the sample efficiency of carefully-modified algorithms by training them using an order of magnitude more updates than usual, significantly improving their performance in the Atari 100k and Deep Mind Control Suite benchmarks. |
| Researcher Affiliation | Collaboration | Pierluca D Oro Mila, Universit e de Montr eal Max Schwarzer Google Brain Mila, Universit e de Montr eal Evgenii Nikishin Mila, Universit e de Montr eal Pierre-Luc Bacon Mila, Universit e de Montr eal Marc G. Bellemare Google Brain, Mila Aaron Courville Mila, Universit e de Montr eal |
| Pseudocode | No | The paper does not include pseudocode or a clearly labeled algorithm block. |
| Open Source Code | No | The paper states that its implementation is based on existing codebases (jaxrl, Dopamine) but does not provide an explicit statement or link for the open-sourcing of its own modifications (SR-SAC, SR-SPR). |
| Open Datasets | Yes | We consider a benchmark based on 15 environments from Deep Mind Control Suite (Tassa et al., 2018). Our selection of tasks, reported in Table 6, is a set for which discussing sample efficiency is sensible (i.e., neither immediately solvable nor unsolvable by common deep RL algorithms). |
| Dataset Splits | No | The paper describes interaction budgets and number of seeds for experiments but does not specify explicit train/validation/test dataset splits in terms of data samples or percentages. |
| Hardware Specification | Yes | On an NVIDIA V100 GPU, at this highest replay ratio of RR=128, our code takes about 10.5 hours on acrobot-swingup and about 15 hours on humanoid-run to complete 500k environment steps. |
| Software Dependencies | No | The paper lists various software tools (e.g., JAX, Jupyter, Matplotlib, numpy, pandas, Sci Py, jaxrl, Dopamine) and their corresponding citations but does not provide specific version numbers for these software dependencies within the text. |
| Experiment Setup | Yes | Table 5: Hyperparameters for SR-SPR and SR-SAC. The ones introduced by this work are at the bottom of the respective tables. |