Successor-Predecessor Intrinsic Exploration
Authors: Changmin Yu, Neil Burgess, Maneesh Sahani, Samuel J Gershman
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show that SPIE yields more efficient and ethologically plausible exploratory behaviour in environments with sparse rewards and bottleneck states than competing methods. We also implement SPIE in deep reinforcement learning agents, and show that the resulting agent achieves stronger empirical performance than existing methods on sparse-reward Atari games. We provide a brief overview of background and relevant literature in Section 2, and formally introduce the novel intrinsic exploration method, Successor-Predecessor Intrinsic Exploration (SPIE), in Section 3. We propose two instantiations of SPIE for discrete and continuous state spaces, with comprehensive empirical examinations of properties of SPIE in discrete state space. We show that SPIE facilitates more efficient exploration, in terms of improved sample efficiency of learning and higher asymptotic return, through empirical evaluations on both discrete and continuous environments in Section 4. Table 1: Evaluations SARSA-SRR and related baseline agents on River Swim and Six Arms (averaged over 100 seeds, numbers in the parentheses represents standard errors). |
| Researcher Affiliation | Academia | 1Institute of Cognitive Neuroscience; 2Gatsby Computational Neuroscience Unit; UCL, London, United Kingdom 3Department of Psychology, Harvard University, Cambridge, United States |
| Pseudocode | Yes | the pseudocode for SARSA-SRR can be found in Appendix B.2 |
| Open Source Code | No | The paper does not contain any explicit statement about providing access to the source code for the methodology described, nor does it provide a link to a code repository. |
| Open Datasets | Yes | We empirically evaluate DQNSF-PF on 6 Atari games with sparse reward structures [32]: Freeway, Gravitar, Montezuma s Revenge, Private Eye, Solaris, and Venture. |
| Dataset Splits | No | The paper mentions using well-known datasets and following established evaluation protocols (e.g., 'We follow the evaluation protocol as stated in Machado et al. [33]'), which typically define splits. However, it does not explicitly provide specific percentages, sample counts, or detailed splitting methodology (e.g., '80/10/10 split', '40,000 training samples') for training, validation, and test datasets within its own text. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as exact GPU or CPU models, or cloud computing instance specifications. |
| Software Dependencies | No | The paper mentions using 'RMSprop' as the optimizer and 'DQN' as the base model, and some general settings for 'epsilon-annealing scheme', but it does not provide specific version numbers for software dependencies or libraries (e.g., 'PyTorch 1.9' or 'CUDA 11.1'). |
| Experiment Setup | Yes | The complete set of hyperparameters for DQN-SF-PF can be found in the Appendix. |