Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning

Authors: David Janz, Jiri Hron, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek

NeurIPS 2019 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We present Atari 2600 results: SU outperforms Bootstrapped DQN (Osband et al., 2016a) on 36/49 and Uncertainty Bellman Equation (O Donoghue et al., 2018) on 43/49 games. We have tested the SU algorithm on the standard set of 49 games from the Arcade Learning Environment, with the aim of showing that SU can be scaled to complex domains that require generalisation between states.
Researcher Affiliation Collaboration David Janz University of Cambridge EMAIL Jiri Hron University of Cambridge EMAIL Przemysław Mazur Wayve Technologies Katja Hofmann Microsoft Research José Miguel Hernández-Lobato University of Cambridge Alan Turing Institute Microsoft Research Sebastian Tschiatschek Microsoft Research
Pseudocode Yes For reference, the pseudocode is included in appendix C.
Open Source Code Yes Code for the tabular experiments: https://djanz.org/successor_uncertainties/tabular_code Code for the Atari experiments: djanz.org/successor_uncertainties/atari_code
Open Datasets Yes We have tested the SU algorithm on the standard set of 49 games from the Arcade Learning Environment, with the aim of showing that SU can be scaled to complex domains that require generalisation between states.
Dataset Splits No The paper states '200M training frames' and describes a 'test protocol', but does not explicitly provide details about a validation dataset split, such as percentages, sample counts, or specific predefined splits for validation.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments.
Software Dependencies No The paper does not provide specific software dependency details with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions).
Experiment Setup Yes More detail on our implementation, network architecture and training procedure can be found in appendix C.2. All parameters were kept identical to those in (Mnih et al., 2015), where applicable.