Sample-Efficient Reinforcement Learning by Breaking the Replay Ratio Barrier

Authors: Pierluca D'Oro, Max Schwarzer, Evgenii Nikishin, Pierre-Luc Bacon, Marc G Bellemare, Aaron Courville

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We push the limits of the sample efficiency of carefully-modified algorithms by training them using an order of magnitude more updates than usual, significantly improving their performance in the Atari 100k and Deep Mind Control Suite benchmarks.
Researcher Affiliation Collaboration Pierluca D Oro Mila, Universit e de Montr eal Max Schwarzer Google Brain Mila, Universit e de Montr eal Evgenii Nikishin Mila, Universit e de Montr eal Pierre-Luc Bacon Mila, Universit e de Montr eal Marc G. Bellemare Google Brain, Mila Aaron Courville Mila, Universit e de Montr eal
Pseudocode No The paper does not include pseudocode or a clearly labeled algorithm block.
Open Source Code No The paper states that its implementation is based on existing codebases (jaxrl, Dopamine) but does not provide an explicit statement or link for the open-sourcing of its own modifications (SR-SAC, SR-SPR).
Open Datasets Yes We consider a benchmark based on 15 environments from Deep Mind Control Suite (Tassa et al., 2018). Our selection of tasks, reported in Table 6, is a set for which discussing sample efficiency is sensible (i.e., neither immediately solvable nor unsolvable by common deep RL algorithms).
Dataset Splits No The paper describes interaction budgets and number of seeds for experiments but does not specify explicit train/validation/test dataset splits in terms of data samples or percentages.
Hardware Specification Yes On an NVIDIA V100 GPU, at this highest replay ratio of RR=128, our code takes about 10.5 hours on acrobot-swingup and about 15 hours on humanoid-run to complete 500k environment steps.
Software Dependencies No The paper lists various software tools (e.g., JAX, Jupyter, Matplotlib, numpy, pandas, Sci Py, jaxrl, Dopamine) and their corresponding citations but does not provide specific version numbers for these software dependencies within the text.
Experiment Setup Yes Table 5: Hyperparameters for SR-SPR and SR-SAC. The ones introduced by this work are at the bottom of the respective tables.