Recurrent Experience Replay in Distributed Reinforcement Learning
Authors: Steven Kapturowski, Georg Ostrovski, John Quan, Remi Munos, Will Dabney
ICLR 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We study the effects of parameter lag resulting in representational drift and recurrent state staleness and empirically derive an improved training strategy. Using a single network architecture and fixed set of hyperparameters, the resulting agent, Recurrent Replay Distributed DQN, quadruples the previous state of the art on Atari-57, and matches the state of the art on DMLab-30. It is the first agent to exceed human-level performance in 52 of the 57 Atari games. In this section we evaluate the empirical performance of R2D2 on two challenging benchmark suites for deep reinforcement learning: Atari-57 (Bellemare et al., 2013) and DMLab-30 (Beattie et al., 2016). |
| Researcher Affiliation | Industry | Steven Kapturowski, Georg Ostrovski, John Quan, R emi Munos, Will Dabney Deep Mind, London, UK {skapturowski,ostrovski,johnquan,munos,wdabney}@google.com |
| Pseudocode | No | No structured pseudocode or algorithm blocks (clearly labeled algorithm sections or code-like formatted procedures) were found in the paper. |
| Open Source Code | No | No concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described in this paper was provided. The link "https://bit.ly/r2d2600" is for agent videos, not code. |
| Open Datasets | Yes | In this section we evaluate the empirical performance of R2D2 on two challenging benchmark suites for deep reinforcement learning: Atari-57 (Bellemare et al., 2013) and DMLab-30 (Beattie et al., 2016). |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing data in a traditional supervised learning sense. It uses reinforcement learning environments where agents learn through interaction. |
| Hardware Specification | No | The paper mentions training "with a single GPU-based learner" but does not provide specific hardware details such as GPU model, CPU type, or memory specifications used for running its experiments. |
| Software Dependencies | No | The paper mentions using "Adam (Kingma & Ba, 2014)" as an optimizer but does not provide specific version numbers for any key software components or libraries required to replicate the experiment. |
| Experiment Setup | Yes | We train the R2D2 agent with a single GPU-based learner, performing approximately 5 network updates per second (each update on a mini-batch of 64 length-80 sequences), and each actor performing 260 environment steps per second on Atari ( 130 per second on DMLab). A full list of hyper-parameters is provided in the Appendix. Table 2: Hyper-parameters values used in R2D2. |