reproducibilityindex.ai

Natural Value Approximators: Learning when to Trust Past Estimates

Authors: Zhongwen Xu, Joseph Modayil, Hado P. van Hasselt, Andre Barreto, David Silver, Tom Schaul

NeurIPS 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that this one change leads to signiﬁcant improvements on multiple Atari games, when applied to the state-of-the-art A3C algorithm. ... In this section, we integrate our method within A3C (Asynchronous advantage actor-critic [9]), ... We investigate the performance of natural value estimates on a collection of 57 video games games from the Atari Learning Environment [1], which has become a standard benchmark for Deep RL methods because of the rich diversity of challenges present in the various games.
Researcher Affiliation	Industry	Zhongwen Xu Deep Mind zhongwen@google.com Joseph Modayil Deep Mind modayil@google.com Hado van Hasselt Deep Mind hado@google.com Andre Barreto Deep Mind andrebarreto@google.com David Silver Deep Mind davidsilver@google.com Tom Schaul Deep Mind schaul@google.com
Pseudocode	No	The paper does not contain any pseudocode or clearly labeled algorithm blocks.
Open Source Code	No	The paper does not provide any statements or links indicating that open-source code for their method is available.
Open Datasets	Yes	We investigate the performance of natural value estimates on a collection of 57 video games games from the Atari Learning Environment [1], which has become a standard benchmark for Deep RL methods because of the rich diversity of challenges present in the various games.
Dataset Splits	No	The paper describes evaluation metrics and conditions ('human starts' and 'no-op starts') but does not specify traditional train/validation/test dataset splits with percentages or counts.
Hardware Specification	No	We train agents for 80 Million agent steps (320 Million Atari game frames) on a single machine with 16 cores. This mentions the number of CPU cores but lacks specific CPU or GPU models, memory details, or other hardware specifications.
Software Dependencies	No	The paper mentions 'A3C algorithm' and 'Adam' optimizer but does not provide specific version numbers for these or other software components.
Experiment Setup	Yes	The network architecture is composed of three layers of convolutions, followed by a fully connected layer with output h, which feeds into the two separate heads (π with an additional softmax, and a scalar v...). The updates are done online with a buffer of the past 20-state transitions. The value targets are n-step targets Zn t... We train agents for 80 Million agent steps (320 Million Atari game frames)... we set k to 50. The networks are trained for 5000 steps using Adam [5] with minibatch size 32.