reproducibilityindex.ai

Rainbow: Combining Improvements in Deep Reinforcement Learning

Authors: Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver

AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efﬁciency and ﬁnal performance. We also provide results from a detailed ablation study that shows the contribution of each component to overall performance. We evaluated all agents on 57 Atari 2600 games from the arcade learning environment (Bellemare et al. 2013).
Researcher Affiliation	Industry	Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver Deep Mind
Pseudocode	No	The paper describes algorithms and methods but does not include any explicit pseudocode blocks or figures labeled 'Algorithm'.
Open Source Code	No	The paper does not provide any statement or link indicating the availability of open-source code for the described methodology.
Open Datasets	Yes	We evaluated all agents on 57 Atari 2600 games from the arcade learning environment (Bellemare et al. 2013).
Dataset Splits	No	The paper describes evaluation during training ('evaluating the latest agent for 500K frames') and testing regimes ('no-ops starts', 'human starts'), but does not provide specific details on a distinct validation dataset split with percentages or counts.
Hardware Specification	No	The paper mentions running 'each agent on a single GPU' but does not provide specific GPU models, CPU details, or other detailed hardware specifications.
Software Dependencies	No	The paper mentions 'Adam optimizer' and 'Tensorﬂow noise generation' but does not provide specific version numbers for software dependencies or libraries.
Experiment Setup	Yes	Hyper-parameter tuning. All Rainbow s components have a number of hyper-parameters. ... Parameter Value Min history to start learning 80K frames Adam learning rate 0.0000625 Exploration ϵ 0.0 Noisy Nets σ0 0.5 Target Network Period 32K frames Adam ϵ 1.5 10 4 Prioritization type proportional Prioritization exponent ω 0.5 Prioritization importance sampling β 0.4 1.0 Multi-step returns n 3 Distributional atoms 51 Distributional min/max values [ 10, 10]