Maximum State Entropy Exploration using Predecessor and Successor Representations

Authors: Arnav Kumar Jain, Lucas Lehnert, Irina Rish, Glen Berseth

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 5 we demonstrate through empirical experiments that ηψ-Learning achieves optimal coverage within a single finite-length trajectory. Moreover, the visualizations presented in Section 5 demonstrate that ηψ-Learning learns an exploration policy that maneuvers through the state space to efficiently explore a task while minimizing the number of times the same state is revisited.
Researcher Affiliation Collaboration Arnav Kumar Jain Mila Quebec AI Institute Université de Montréal Lucas Lehnert Fundamental AI Research at Meta Irina Rish Mila Quebec AI Institute Université de Montréal Glen Berseth Mila Quebec AI Institute Université de Montréal
Pseudocode Yes Algorithm 1 ηψ-Learning: Dynamic Programming Framework
Open Source Code Yes An implementation of the ηψ-Learning algorithm together with instructions for reproducing the experiments presented in this paper can be found at https://github.com/arnavkj1995/Eta_Psi_Learning.
Open Datasets Yes The Chain MDP and River Swim [58] is a six-state chain where the transitions are deterministic or stochastic, respectively.
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits. It mentions training parameters like 'Length of trajectory from environment' and 'Number of episodes' and evaluation parameters for metrics, but no formal data splitting strategy is described.
Hardware Specification Yes All models were trained on a single NVIDIA V100 GPU with 32 GB memory.
Software Dependencies No The paper mentions software components like 'RLHive [46] library' and 'Dreamer V2 [20]' and 'Adam [29] Optimizer', but does not provide specific version numbers for these software dependencies, which is required for reproducibility.
Experiment Setup Yes In Appendix G, 'Hyper Parameters' (Table 3 and Table 4) explicitly list specific values for various experimental setup details, including 'Batch Size', 'Sequence Length', 'α for γ-function', 'Encoder layers', 'Learning rate', 'Optimizer', and many more.