reproducibilityindex.ai

Successor-Predecessor Intrinsic Exploration

Authors: Changmin Yu, Neil Burgess, Maneesh Sahani, Samuel J Gershman

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show that SPIE yields more efficient and ethologically plausible exploratory behaviour in environments with sparse rewards and bottleneck states than competing methods. We also implement SPIE in deep reinforcement learning agents, and show that the resulting agent achieves stronger empirical performance than existing methods on sparse-reward Atari games. We provide a brief overview of background and relevant literature in Section 2, and formally introduce the novel intrinsic exploration method, Successor-Predecessor Intrinsic Exploration (SPIE), in Section 3. We propose two instantiations of SPIE for discrete and continuous state spaces, with comprehensive empirical examinations of properties of SPIE in discrete state space. We show that SPIE facilitates more efficient exploration, in terms of improved sample efficiency of learning and higher asymptotic return, through empirical evaluations on both discrete and continuous environments in Section 4. Table 1: Evaluations SARSA-SRR and related baseline agents on River Swim and Six Arms (averaged over 100 seeds, numbers in the parentheses represents standard errors).
Researcher Affiliation	Academia	1Institute of Cognitive Neuroscience; 2Gatsby Computational Neuroscience Unit; UCL, London, United Kingdom 3Department of Psychology, Harvard University, Cambridge, United States
Pseudocode	Yes	the pseudocode for SARSA-SRR can be found in Appendix B.2
Open Source Code	No	The paper does not contain any explicit statement about providing access to the source code for the methodology described, nor does it provide a link to a code repository.
Open Datasets	Yes	We empirically evaluate DQNSF-PF on 6 Atari games with sparse reward structures [32]: Freeway, Gravitar, Montezuma s Revenge, Private Eye, Solaris, and Venture.
Dataset Splits	No	The paper mentions using well-known datasets and following established evaluation protocols (e.g., 'We follow the evaluation protocol as stated in Machado et al. [33]'), which typically define splits. However, it does not explicitly provide specific percentages, sample counts, or detailed splitting methodology (e.g., '80/10/10 split', '40,000 training samples') for training, validation, and test datasets within its own text.
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as exact GPU or CPU models, or cloud computing instance specifications.
Software Dependencies	No	The paper mentions using 'RMSprop' as the optimizer and 'DQN' as the base model, and some general settings for 'epsilon-annealing scheme', but it does not provide specific version numbers for software dependencies or libraries (e.g., 'PyTorch 1.9' or 'CUDA 11.1').
Experiment Setup	Yes	The complete set of hyperparameters for DQN-SF-PF can be found in the Appendix.