Count-Based Exploration with Neural Density Models

Authors: Georg Ostrovski, Marc G. Bellemare, Aäron Oord, Rémi Munos

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We answer the first question by demonstrating the use of Pixel CNN, an advanced neural density model for images, to supply a pseudo-count. In particular, we examine the intrinsic difficulties in adapting Bellemare et al. s approach when assumptions about the model are violated. The result is a more practical and general algorithm requiring no special apparatus. We combine Pixel CNN pseudo-counts with different agent architectures to dramatically improve the state of the art on several hard Atari games.
Researcher Affiliation Industry Deep Mind, London, UK. Correspondence to: Georg Ostrovski <ostrovski@google.com>.
Pseudocode No The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of the methodology described.
Open Datasets Yes We investigate how to best resolve these tensions in the context of the Arcade Learning Environment (Bellemare et al., 2013), a suite of benchmark Atari 2600 games.
Dataset Splits No The paper describes training agents within the Arcade Learning Environment for Atari games, and evaluates performance over training steps. However, it does not specify explicit train/validation/test dataset splits with percentages or sample counts in the traditional sense, as it involves continuous interaction with an environment rather than static dataset partitioning.
Hardware Specification No The paper does not provide specific hardware details such as GPU or CPU models, or cloud resource specifications used for running the experiments.
Software Dependencies No The paper mentions using the RMSProp optimizer but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup Yes Its core is a stack of 2 gated residual blocks with 16 feature maps (compared to 15 residual blocks with 128 feature maps in vanilla Pixel CNN). As was done with the CTS model, images are downsampled to 42x42 and quantized to 3-bit greyscale. ... The lowest final training loss is achieved by a constant learning rate of 0.001 or a decaying learning rate of 0.1 n^(-1/2). ... in fact the constant learning rate 0.001, paired with a PG decay cn = c n^(-1/2), obtains the best exploration results on hard exploration games like MONTEZUMA S REVENGE, see Fig. 2(right). We find the model to be robust across 1-2 orders of magnitude for the value of c, and informally determine c = 0.1 to be a sensible configuration. ... we perform updates of the Pixel CNN model and compute the reward bonus on (randomly chosen) 25% of all steps.