The Uncertainty Bellman Equation and Exploration

Authors: Brendan O’Donoghue, Ian Osband, Remi Munos, Vlad Mnih

ICML 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Substituting our UBE-exploration strategy for ϵ-greedy improves DQN performance on 51 out of 57 games in the Atari suite.
Researcher Affiliation Industry 1Deep Mind. Correspondence to: Brendan O Donoghue <bodonoghue@google.com>.
Pseudocode Yes Algorithm 1 One-step UBE exploration with linear uncertainty estimates.
Open Source Code No The paper does not contain any explicit statement about making its source code publicly available, nor does it provide a link to a code repository.
Open Datasets Yes Here we present results of Algorithm (1) on the Atari suite of games (Bellemare et al., 2012).
Dataset Splits Yes Every 1M frames the agents were saved and evaluated (without learning) on 0.5M frames, where each episode is started from the random start condition described in (Mnih et al., 2015).
Hardware Specification No The paper mentions running experiments 'on a GPU' but does not specify any particular model or detailed hardware configuration.
Software Dependencies No The paper mentions software components like 'DQN' and 'RMSProp optimizer' but does not provide specific version numbers for any of these dependencies.
Experiment Setup Yes The β constant in (3) was chosen to be 0.01 for all games, by a parameter sweep. We used the exact same network architecture, learning rate, optimizer, pre-processing and replay scheme as described in Mnih et al. (2015). For the uncertainty sub-network we used a single fully connected hidden layer with 512 hidden units followed by the output layer. We trained the uncertainty head using a separate RMSProp optimizer (Tieleman & Hinton, 2012) with learning rate 10^-3.