Rainbow: Combining Improvements in Deep Reinforcement Learning
Authors: Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that the combination provides state-of-the-art performance on the Atari 2600 benchmark, both in terms of data efficiency and final performance. We also provide results from a detailed ablation study that shows the contribution of each component to overall performance. We evaluated all agents on 57 Atari 2600 games from the arcade learning environment (Bellemare et al. 2013). |
| Researcher Affiliation | Industry | Matteo Hessel, Joseph Modayil, Hado van Hasselt, Tom Schaul, Georg Ostrovski, Will Dabney, Dan Horgan, Bilal Piot, Mohammad Azar, David Silver Deep Mind |
| Pseudocode | No | The paper describes algorithms and methods but does not include any explicit pseudocode blocks or figures labeled 'Algorithm'. |
| Open Source Code | No | The paper does not provide any statement or link indicating the availability of open-source code for the described methodology. |
| Open Datasets | Yes | We evaluated all agents on 57 Atari 2600 games from the arcade learning environment (Bellemare et al. 2013). |
| Dataset Splits | No | The paper describes evaluation during training ('evaluating the latest agent for 500K frames') and testing regimes ('no-ops starts', 'human starts'), but does not provide specific details on a distinct validation dataset split with percentages or counts. |
| Hardware Specification | No | The paper mentions running 'each agent on a single GPU' but does not provide specific GPU models, CPU details, or other detailed hardware specifications. |
| Software Dependencies | No | The paper mentions 'Adam optimizer' and 'Tensorflow noise generation' but does not provide specific version numbers for software dependencies or libraries. |
| Experiment Setup | Yes | Hyper-parameter tuning. All Rainbow s components have a number of hyper-parameters. ... Parameter Value Min history to start learning 80K frames Adam learning rate 0.0000625 Exploration ϵ 0.0 Noisy Nets σ0 0.5 Target Network Period 32K frames Adam ϵ 1.5 10 4 Prioritization type proportional Prioritization exponent ω 0.5 Prioritization importance sampling β 0.4 1.0 Multi-step returns n 3 Distributional atoms 51 Distributional min/max values [ 10, 10] |