reproducibilityindex.ai

Deep Exploration via Bootstrapped DQN

Authors: Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy

NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate that bootstrapped DQN can combine deep exploration with deep neural networks for exponentially faster learning than any dithering strategy. In the Arcade Learning Environment bootstrapped DQN substantially improves learning speed and cumulative performance across most games.
Researcher Affiliation	Collaboration	Ian Osband1,2, Charles Blundell2, Alexander Pritzel2, Benjamin Van Roy1 1Stanford University, 2Google Deep Mind
Pseudocode	Yes	We present a detailed algorithm for our implementation of bootstrapped DQN in Appendix B.
Open Source Code	No	The paper discusses the implementation and performance of their algorithm but does not provide a link or an explicit statement about releasing the source code.
Open Datasets	Yes	We now evaluate our algorithm across 49 Atari games on the Arcade Learning Environment [1]. [1] Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents. ar Xiv preprint ar Xiv:1207.4708, 2012.
Dataset Splits	No	The paper mentions following the setup of [25] and providing full details in Appendix D, but does not explicitly state specific train/validation/test split percentages or counts in the main text.
Hardware Specification	No	The paper mentions that 'on a single machine our implementation runs roughly 20% slower than DQN' but does not specify any particular hardware components like CPU or GPU models.
Software Dependencies	No	The paper describes the algorithmic implementation details but does not list specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks).
Experiment Setup	Yes	We trained a fully-connected 2-layer neural networks with 50 rectiﬁed linear units (Re LU) in each layer on 50 bootstrapped samples from the data. ...We choose K = 10. ...in light of this empirical observation for Atari, we chose p=1 to save on minibatch passes.