reproducibilityindex.ai

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Authors: David Janz, Jiri Hron, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We present Atari 2600 results: SU outperforms Bootstrapped DQN (Osband et al., 2016a) on 36/49 and Uncertainty Bellman Equation (O Donoghue et al., 2018) on 43/49 games. We have tested the SU algorithm on the standard set of 49 games from the Arcade Learning Environment, with the aim of showing that SU can be scaled to complex domains that require generalisation between states.
Researcher Affiliation	Collaboration	David Janz University of Cambridge dj343@cam.ac.uk Jiri Hron University of Cambridge jh2084@cam.ac.uk Przemysław Mazur Wayve Technologies Katja Hofmann Microsoft Research José Miguel Hernández-Lobato University of Cambridge Alan Turing Institute Microsoft Research Sebastian Tschiatschek Microsoft Research
Pseudocode	Yes	For reference, the pseudocode is included in appendix C.
Open Source Code	Yes	Code for the tabular experiments: https://djanz.org/successor_uncertainties/tabular_code Code for the Atari experiments: djanz.org/successor_uncertainties/atari_code
Open Datasets	Yes	We have tested the SU algorithm on the standard set of 49 games from the Arcade Learning Environment, with the aim of showing that SU can be scaled to complex domains that require generalisation between states.
Dataset Splits	No	The paper states '200M training frames' and describes a 'test protocol', but does not explicitly provide details about a validation dataset split, such as percentages, sample counts, or specific predefined splits for validation.
Hardware Specification	No	The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies	No	The paper does not provide specific software dependency details with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup	Yes	More detail on our implementation, network architecture and training procedure can be found in appendix C.2. All parameters were kept identical to those in (Mnih et al., 2015), where applicable.