Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning
Authors: David Janz, Jiri Hron, Przemysław Mazur, Katja Hofmann, José Miguel Hernández-Lobato, Sebastian Tschiatschek
NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We present Atari 2600 results: SU outperforms Bootstrapped DQN (Osband et al., 2016a) on 36/49 and Uncertainty Bellman Equation (O Donoghue et al., 2018) on 43/49 games. We have tested the SU algorithm on the standard set of 49 games from the Arcade Learning Environment, with the aim of showing that SU can be scaled to complex domains that require generalisation between states. |
| Researcher Affiliation | Collaboration | David Janz University of Cambridge dj343@cam.ac.uk Jiri Hron University of Cambridge jh2084@cam.ac.uk Przemysław Mazur Wayve Technologies Katja Hofmann Microsoft Research José Miguel Hernández-Lobato University of Cambridge Alan Turing Institute Microsoft Research Sebastian Tschiatschek Microsoft Research |
| Pseudocode | Yes | For reference, the pseudocode is included in appendix C. |
| Open Source Code | Yes | Code for the tabular experiments: https://djanz.org/successor_uncertainties/tabular_code Code for the Atari experiments: djanz.org/successor_uncertainties/atari_code |
| Open Datasets | Yes | We have tested the SU algorithm on the standard set of 49 games from the Arcade Learning Environment, with the aim of showing that SU can be scaled to complex domains that require generalisation between states. |
| Dataset Splits | No | The paper states '200M training frames' and describes a 'test protocol', but does not explicitly provide details about a validation dataset split, such as percentages, sample counts, or specific predefined splits for validation. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory amounts used for running experiments. |
| Software Dependencies | No | The paper does not provide specific software dependency details with version numbers (e.g., Python, PyTorch, TensorFlow versions, or specific library versions). |
| Experiment Setup | Yes | More detail on our implementation, network architecture and training procedure can be found in appendix C.2. All parameters were kept identical to those in (Mnih et al., 2015), where applicable. |