Successor Feature Landmarks for Long-Horizon Goal-Conditioned Reinforcement Learning

Authors: Christopher Hoang, Sungryull Sohn, Jongwook Choi, Wilka Carvalho, Honglak Lee

NeurIPS 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show in our experiments on Mini Grid and Vi ZDoom that SFL enables efficient exploration of large, high-dimensional state spaces and outperforms state-of-the-art baselines on long-horizon GCRL tasks. We evaluate SFL against current graph-based methods in long-horizon goal-reaching RL and visual navigation on Mini Grid [6], a 2D gridworld, and Vi ZDoom [37], a visual 3D first-person view environment with large mazes. We observe that SFL outperforms state-of-the-art navigation baselines, most notably when goals are furthest away. In a setting where exploration is needed to collect training experience, SFL significantly outperforms the other methods which struggle to scale in Vi ZDoom s high-dimensional state space. In our experiments, we evaluate the benefits of SFL for exploration and long-horizon GCRL.
Researcher Affiliation Collaboration Christopher Hoang 1 Sungryull Sohn 1 2 Jongwook Choi 1 Wilka Carvalho 1 Honglak Lee 1 2 1University of Michigan 2LG AI Research
Pseudocode Yes Algorithm 1 Training; Algorithm 2 Graph-Update (4.2)
Open Source Code Yes The demo video and code can be found at https://2016choang.github.io/sfl.
Open Datasets Yes We use mazes from SPTM in our experiments, with one example shown in Figure 3 [29]. Mini Grid [6], a 2D gridworld. https://github.com/maximecb/gym-minigrid, 2018.
Dataset Splits No The paper describes experimental setups (random spawn, fixed spawn) and goal sampling strategies, but it does not specify explicit train/validation/test dataset splits with percentages or sample counts for static datasets.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory) used for running the experiments.
Software Dependencies No The paper mentions 'rlpyt codebase' and 'pretrained ResNet-18 backbone from SPTM' but does not provide specific version numbers for any software components.
Experiment Setup Yes See Appendix C for more details on feature learning, edge formation, and hyperparameters.