The Uncertainty Bellman Equation and Exploration
Authors: Brendan O’Donoghue, Ian Osband, Remi Munos, Vlad Mnih
ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Substituting our UBE-exploration strategy for ϵ-greedy improves DQN performance on 51 out of 57 games in the Atari suite. |
| Researcher Affiliation | Industry | 1Deep Mind. Correspondence to: Brendan O Donoghue <bodonoghue@google.com>. |
| Pseudocode | Yes | Algorithm 1 One-step UBE exploration with linear uncertainty estimates. |
| Open Source Code | No | The paper does not contain any explicit statement about making its source code publicly available, nor does it provide a link to a code repository. |
| Open Datasets | Yes | Here we present results of Algorithm (1) on the Atari suite of games (Bellemare et al., 2012). |
| Dataset Splits | Yes | Every 1M frames the agents were saved and evaluated (without learning) on 0.5M frames, where each episode is started from the random start condition described in (Mnih et al., 2015). |
| Hardware Specification | No | The paper mentions running experiments 'on a GPU' but does not specify any particular model or detailed hardware configuration. |
| Software Dependencies | No | The paper mentions software components like 'DQN' and 'RMSProp optimizer' but does not provide specific version numbers for any of these dependencies. |
| Experiment Setup | Yes | The β constant in (3) was chosen to be 0.01 for all games, by a parameter sweep. We used the exact same network architecture, learning rate, optimizer, pre-processing and replay scheme as described in Mnih et al. (2015). For the uncertainty sub-network we used a single fully connected hidden layer with 512 hidden units followed by the output layer. We trained the uncertainty head using a separate RMSProp optimizer (Tieleman & Hinton, 2012) with learning rate 10^-3. |