Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Authors: David Janz, Jiri Hron, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present Atari 2600 results: SU outperforms Bootstrapped DQN (Osband et al., 2016a) on 36/49 and Uncertainty Bellman Equation (O Donoghue et al., 2018) on 43/49 games. We have tested the SU algorithm on the standard set of 49 games from the Arcade Learning Environment, with the aim of showing that SU can be scaled to complex domains that require generalisation between states.
Researcher Affiliation Collaboration David Janz University of Cambridge dj343@cam.ac.uk Jiri Hron University of Cambridge jh2084@cam.ac.uk Przemysław Mazur Wayve Technologies Katja Hofmann Microsoft Research José Miguel Hernández-Lobato University of Cambridge Alan Turing Institute Microsoft Research Sebastian Tschiatschek Microsoft Research
Pseudocode Yes For reference, the pseudocode is included in appendix C.
Open Source Code Yes Code for the tabular experiments: https://djanz.org/successor_uncertainties/tabular_code Code for the Atari experiments: djanz.org/successor_uncertainties/atari_code
Open Datasets Yes We have tested the SU algorithm on the standard set of 49 games from the Arcade Learning Environment, with the aim of showing that SU can be scaled to complex domains that require generalisation between states.
Dataset Splits No The paper states '200M training frames' and describes a 'test protocol', but does not explicitly provide details about a validation dataset split, such as percentages, sample counts, or specific predefined splits for validation.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup Yes More detail on our implementation, network architecture and training procedure can be found in appendix C.2. All parameters were kept identical to those in (Mnih et al., 2015), where applicable.