Maximum State Entropy Exploration using Predecessor and Successor Representations
Authors: Arnav Kumar Jain, Lucas Lehnert, Irina Rish, Glen Berseth
NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In Section 5 we demonstrate through empirical experiments that ηψ-Learning achieves optimal coverage within a single finite-length trajectory. Moreover, the visualizations presented in Section 5 demonstrate that ηψ-Learning learns an exploration policy that maneuvers through the state space to efficiently explore a task while minimizing the number of times the same state is revisited. |
| Researcher Affiliation | Collaboration | Arnav Kumar Jain Mila Quebec AI Institute Université de Montréal Lucas Lehnert Fundamental AI Research at Meta Irina Rish Mila Quebec AI Institute Université de Montréal Glen Berseth Mila Quebec AI Institute Université de Montréal |
| Pseudocode | Yes | Algorithm 1 ηψ-Learning: Dynamic Programming Framework |
| Open Source Code | Yes | An implementation of the ηψ-Learning algorithm together with instructions for reproducing the experiments presented in this paper can be found at https://github.com/arnavkj1995/Eta_Psi_Learning. |
| Open Datasets | Yes | The Chain MDP and River Swim [58] is a six-state chain where the transitions are deterministic or stochastic, respectively. |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It mentions training parameters like 'Length of trajectory from environment' and 'Number of episodes' and evaluation parameters for metrics, but no formal data splitting strategy is described. |
| Hardware Specification | Yes | All models were trained on a single NVIDIA V100 GPU with 32 GB memory. |
| Software Dependencies | No | The paper mentions software components like 'RLHive [46] library' and 'Dreamer V2 [20]' and 'Adam [29] Optimizer', but does not provide specific version numbers for these software dependencies, which is required for reproducibility. |
| Experiment Setup | Yes | In Appendix G, 'Hyper Parameters' (Table 3 and Table 4) explicitly list specific values for various experimental setup details, including 'Batch Size', 'Sequence Length', 'α for γ-function', 'Encoder layers', 'Learning rate', 'Optimizer', and many more. |