Recurrent Experience Replay in Distributed Reinforcement Learning

Authors: Steven Kapturowski, Georg Ostrovski, John Quan, Remi Munos, Will Dabney

ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We study the effects of parameter lag resulting in representational drift and recurrent state staleness and empirically derive an improved training strategy. Using a single network architecture and fixed set of hyperparameters, the resulting agent, Recurrent Replay Distributed DQN, quadruples the previous state of the art on Atari-57, and matches the state of the art on DMLab-30. It is the first agent to exceed human-level performance in 52 of the 57 Atari games. In this section we evaluate the empirical performance of R2D2 on two challenging benchmark suites for deep reinforcement learning: Atari-57 (Bellemare et al., 2013) and DMLab-30 (Beattie et al., 2016).
Researcher Affiliation Industry Steven Kapturowski, Georg Ostrovski, John Quan, R emi Munos, Will Dabney Deep Mind, London, UK {skapturowski,ostrovski,johnquan,munos,wdabney}@google.com
Pseudocode No No structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures) were found in the paper.
Open Source Code No No concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper was provided. The link "https://bit.ly/r2d2600" is for agent videos, not code.
Open Datasets Yes In this section we evaluate the empirical performance of R2D2 on two challenging benchmark suites for deep reinforcement learning: Atari-57 (Bellemare et al., 2013) and DMLab-30 (Beattie et al., 2016).
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing data in a traditional supervised learning sense. It uses reinforcement learning environments where agents learn through interaction.
Hardware Specification No The paper mentions training "with a single GPU-based learner" but does not provide specific hardware details such as GPU model, CPU type, or memory specifications used for running its experiments.
Software Dependencies No The paper mentions using "Adam (Kingma & Ba, 2014)" as an optimizer but does not provide specific version numbers for any key software components or libraries required to replicate the experiment.
Experiment Setup Yes We train the R2D2 agent with a single GPU-based learner, performing approximately 5 network updates per second (each update on a mini-batch of 64 length-80 sequences), and each actor performing 260 environment steps per second on Atari ( 130 per second on DMLab). A full list of hyper-parameters is provided in the Appendix. Table 2: Hyper-parameters values used in R2D2.