Rainbow: Combining Improvements in Deep Reinforcement Learning

Authors: Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance. We also provide results from a detailed ablation study that shows the contribution of each component to overall performance. We evaluated all agents on 57 Atari 2600 games from the arcade learning environment (Bellemare et al. 2013).
Researcher Affiliation Industry Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver Deep Mind
Pseudocode No The paper describes algorithms and methods but does not include any explicit pseudocode blocks or figures labeled 'Algorithm'.
Open Source Code No The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets Yes We evaluated all agents on 57 Atari 2600 games from the arcade learning environment (Bellemare et al. 2013).
Dataset Splits No The paper describes evaluation during training ('evaluating the latest agent for 500K frames') and testing regimes ('no-ops starts', 'human starts'), but does not provide specific details on a distinct validation dataset split with percentages or counts.
Hardware Specification No The paper mentions running 'each agent on a single GPU' but does not provide specific GPU models, CPU details, or other detailed hardware specifications.
Software Dependencies No The paper mentions 'Adam optimizer' and 'Tensorflow noise generation' but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup Yes Hyper-parameter tuning. All Rainbow s components have a number of hyper-parameters. ... Parameter Value Min history to start learning 80K frames Adam learning rate 0.0000625 Exploration ϵ 0.0 Noisy Nets σ0 0.5 Target Network Period 32K frames Adam ϵ 1.5 10 4 Prioritization type proportional Prioritization exponent ω 0.5 Prioritization importance sampling β 0.4 1.0 Multi-step returns n 3 Distributional atoms 51 Distributional min/max values [ 10, 10]