On Bonus Based Exploration Methods In The Arcade Learning Environment
Authors: Adrien Ali Taiga, William Fedus, Marlos C. Machado, Aaron Courville, Marc G. Bellemare
ICLR 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper we reassess popular bonus-based exploration methods within a common evaluation framework. We combine Rainbow (Hessel et al., 2018) with different exploration bonuses and evaluate its performance on MONTEZUMA S REVENGE, Bellemare et al. s set of hard of exploration games with sparse rewards, and the whole Atari 2600 suite. |
| Researcher Affiliation | Collaboration | Adrien Ali Ta ıga MILA, Universit e de Montr eal Google Brain William Fedus MILA, Universit e de Montr eal Google Brain Marlos C. Machado Google Brain Aaron Courville MILA, Universit e de Montr eal Marc G. Bellemare Google Brain |
| Pseudocode | No | The paper does not contain any structured pseudocode or algorithm blocks. |
| Open Source Code | No | The paper mentions using 'the Dopamine framework (Castro et al., 2018)' for its implementation but does not provide a link or explicit statement about the availability of its own specific source code for the described methodology. |
| Open Datasets | Yes | Arcade Learning Environment (ALE; Bellemare et al., 2013). ... the whole Atari 2600 suite. |
| Dataset Splits | No | The paper discusses 'training' and 'evaluation' and mentions tuning hyperparameters on MONTEZUMA S REVENGE, but it does not provide specific details on a separate validation dataset split (percentages, sample counts, or clear designation of a validation set) for hyperparameter tuning or early stopping. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU models, or cloud computing instance types. |
| Software Dependencies | No | The paper mentions using the 'Dopamine framework (Castro et al., 2018)' and 'Rainbow implementation', and refers to 'Adam (Kingma & Ba, 2014)', but it does not specify version numbers for any of the software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | Appendix A.1 'RAINBOW AND ATARI PREPROCESSING' and A.2 'HYPERPARAMETER TUNING ON MONTEZUMA S REVENGE' provide specific hyperparameter values: 'Discount factor γ 0.99', 'Adam learning rate 6.25 10 5', 'Adam ϵ 1.5 10 4', 'Multi-step returns n 3', 'Distributional atoms 51', 'Distributional min/max values [-10, 10]', and details on tuning 'β' and 'α' for bonus-based methods. |