Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning
Authors: Christopher Hoang, Sungryull Sohn, Jongwook Choi, Wilka Carvalho, Honglak Lee
NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show in our experiments on Mini Grid and Vi ZDoom that SFL enables efficient exploration of large, high-dimensional state spaces and outperforms state-of-the-art baselines on long-horizon GCRL tasks. We evaluate SFL against current graph-based methods in long-horizon goal-reaching RL and visual navigation on Mini Grid [6], a 2D gridworld, and Vi ZDoom [37], a visual 3D first-person view environment with large mazes. We observe that SFL outperforms state-of-the-art navigation baselines, most notably when goals are furthest away. In a setting where exploration is needed to collect training experience, SFL significantly outperforms the other methods which struggle to scale in Vi ZDoom s high-dimensional state space. In our experiments, we evaluate the benefits of SFL for exploration and long-horizon GCRL. |
| Researcher Affiliation | Collaboration | Christopher Hoang 1 Sungryull Sohn 1 2 Jongwook Choi 1 Wilka Carvalho 1 Honglak Lee 1 2 1University of Michigan 2LG AI Research |
| Pseudocode | Yes | Algorithm 1 Training; Algorithm 2 Graph-Update (4.2) |
| Open Source Code | Yes | The demo video and code can be found at https://2016choang.github.io/sfl. |
| Open Datasets | Yes | We use mazes from SPTM in our experiments, with one example shown in Figure 3 [29]. Mini Grid [6], a 2D gridworld. https://github.com/maximecb/gym-minigrid, 2018. |
| Dataset Splits | No | The paper describes experimental setups (random spawn, fixed spawn) and goal sampling strategies, but it does not specify explicit train/validation/test dataset splits with percentages or sample counts for static datasets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments. |
| Software Dependencies | No | The paper mentions 'rlpyt codebase' and 'pretrained ResNet-18 backbone from SPTM' but does not provide specific version numbers for any software components. |
| Experiment Setup | Yes | See Appendix C for more details on feature learning, edge formation, and hyperparameters. |