Deep Exploration via Bootstrapped DQN
Authors: Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy
NeurIPS 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We demonstrate that bootstrapped DQN can combine deep exploration with deep neural networks for exponentially faster learning than any dithering strategy. In the Arcade Learning Environment bootstrapped DQN substantially improves learning speed and cumulative performance across most games. |
| Researcher Affiliation | Collaboration | Ian Osband1,2, Charles Blundell2, Alexander Pritzel2, Benjamin Van Roy1 1Stanford University, 2Google Deep Mind |
| Pseudocode | Yes | We present a detailed algorithm for our implementation of bootstrapped DQN in Appendix B. |
| Open Source Code | No | The paper discusses the implementation and performance of their algorithm but does not provide a link or an explicit statement about releasing the source code. |
| Open Datasets | Yes | We now evaluate our algorithm across 49 Atari games on the Arcade Learning Environment [1]. [1] Marc G Bellemare, Yavar Naddaf, Joel Veness, and Michael Bowling. The arcade learning environment: An evaluation platform for general agents. ar Xiv preprint ar Xiv:1207.4708, 2012. |
| Dataset Splits | No | The paper mentions following the setup of [25] and providing full details in Appendix D, but does not explicitly state specific train/validation/test split percentages or counts in the main text. |
| Hardware Specification | No | The paper mentions that 'on a single machine our implementation runs roughly 20% slower than DQN' but does not specify any particular hardware components like CPU or GPU models. |
| Software Dependencies | No | The paper describes the algorithmic implementation details but does not list specific software dependencies with version numbers (e.g., programming languages, libraries, frameworks). |
| Experiment Setup | Yes | We trained a fully-connected 2-layer neural networks with 50 rectified linear units (Re LU) in each layer on 50 bootstrapped samples from the data. ...We choose K = 10. ...in light of this empirical observation for Atari, we chose p=1 to save on minibatch passes. |