Unifying Count-Based Exploration and Intrinsic Motivation
Authors: Marc Bellemare, Sriram Srinivasan, Georg Ostrovski, Tom Schaul, David Saxton, Remi Munos
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We apply our ideas to Atari 2600 games, providing sensible pseudo-counts from raw pixels. We transform these pseudo-counts into exploration bonuses and obtain significantly improved exploration in a number of hard games, including the infamously difficult MONTEZUMA S REVENGE. Figure 2 depicts the result of our experiment, averaged across 5 trials. |
| Researcher Affiliation | Industry | Marc G. Bellemare bellemare@google.com Sriram Srinivasan srsrinivasan@google.com Georg Ostrovski ostrovski@google.com Tom Schaul schaul@google.com David Saxton saxton@google.com Google Deep Mind London, United Kingdom R emi Munos munos@google.com |
| Pseudocode | No | The paper contains mathematical equations and conceptual descriptions but does not include any explicit pseudocode or algorithm blocks. |
| Open Source Code | No | The paper provides a link to a video of the agent playing, but not to the source code for the methodology described in the paper. |
| Open Datasets | Yes | We apply our ideas to Atari 2600 games, providing sensible pseudo-counts from raw pixels. We use the Arcade Learning Environment (Bellemare et al., 2013). |
| Dataset Splits | No | The paper mentions training frames and performance over training time (e.g., '50 million frames', 'in-training median score') but does not specify explicit training, validation, or test dataset splits (e.g., percentages or counts). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU models, CPU types, or memory specifications). |
| Software Dependencies | No | The paper mentions algorithms like 'Double DQN (van Hasselt et al., 2016)' and 'A3C (Asynchronous Advantage Actor-Critic) algorithm of Mnih et al. (2016)', but does not provide specific version numbers for any software libraries, frameworks, or environments used. |
| Experiment Setup | Yes | We used a bonus of the form R+ n (x, a) := β( ˆNn(x) + 0.01) 1/2, (4) where β = 0.05 was selected from a coarse parameter sweep. We trained our agents Q-functions with Double DQN (van Hasselt et al., 2016), with one important modification: we mixed the Double Q-Learning target with the Monte Carlo return. |