reproducibilityindex.ai

Count-Based Exploration with Neural Density Models

Authors: Georg Ostrovski, Marc G. Bellemare, Aäron Oord, Rémi Munos

ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We answer the ﬁrst question by demonstrating the use of Pixel CNN, an advanced neural density model for images, to supply a pseudo-count. In particular, we examine the intrinsic difﬁculties in adapting Bellemare et al. s approach when assumptions about the model are violated. The result is a more practical and general algorithm requiring no special apparatus. We combine Pixel CNN pseudo-counts with different agent architectures to dramatically improve the state of the art on several hard Atari games.
Researcher Affiliation	Industry	Deep Mind, London, UK. Correspondence to: Georg Ostrovski <ostrovski@google.com>.
Pseudocode	No	The paper does not contain any structured pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement or link for the open-source code of the methodology described.
Open Datasets	Yes	We investigate how to best resolve these tensions in the context of the Arcade Learning Environment (Bellemare et al., 2013), a suite of benchmark Atari 2600 games.
Dataset Splits	No	The paper describes training agents within the Arcade Learning Environment for Atari games, and evaluates performance over training steps. However, it does not specify explicit train/validation/test dataset splits with percentages or sample counts in the traditional sense, as it involves continuous interaction with an environment rather than static dataset partitioning.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU or CPU models, or cloud resource specifications used for running the experiments.
Software Dependencies	No	The paper mentions using the RMSProp optimizer but does not provide specific version numbers for any software dependencies or libraries used in the experiments.
Experiment Setup	Yes	Its core is a stack of 2 gated residual blocks with 16 feature maps (compared to 15 residual blocks with 128 feature maps in vanilla Pixel CNN). As was done with the CTS model, images are downsampled to 42x42 and quantized to 3-bit greyscale. ... The lowest ﬁnal training loss is achieved by a constant learning rate of 0.001 or a decaying learning rate of 0.1 n^(-1/2). ... in fact the constant learning rate 0.001, paired with a PG decay cn = c n^(-1/2), obtains the best exploration results on hard exploration games like MONTEZUMA S REVENGE, see Fig. 2(right). We ﬁnd the model to be robust across 1-2 orders of magnitude for the value of c, and informally determine c = 0.1 to be a sensible conﬁguration. ... we perform updates of the Pixel CNN model and compute the reward bonus on (randomly chosen) 25% of all steps.