Active Hierarchical Exploration with Stable Subgoal Representation Learning

Authors: Siyuan Li, Jin Zhang, Jianhao Wang, Yang Yu, Chongjie Zhang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results show that our approach significantly outperforms state-of-the-art baselines in continuous control tasks with sparse rewards. We compare the proposed method HESS with state-of-the-art baselines in a number of difficult control tasks with sparse rewards. Note that the environment setting in this paper is much more challenging to exploration than the multi-task and deceptive dense-reward ones in the baselines. Experimental results demonstrate that HESS significantly outperforms existing baselines. In addition, we perform multiple ablations illustrating the importance of the various components of HESS.
Researcher Affiliation Academia Siyuan Li , Jin Zhang , Jianhao Wang , Yang Yu , Chongjie Zhang Tsinghua University, Nanjing University {sy-li17,jin-zhan20,wjh19}@mails.tsinghua.edu.cn {yuy}@lamda.nju.edu.cn, chongjie@tsinghua.edu.cn
Pseudocode Yes Algorithm 1 HESS algorithm
Open Source Code Yes Code is available at https://github.com/SiyuanLee/HESS. For reproducibility, we include an anonymous downloadable source code in the supplementary material.
Open Datasets Yes We evaluate on a suite of Mu Jo Co (Todorov et al., 2012) tasks that are widely used in the HRL community, including Ant (or Point) Maze, Ant Push, Ant Four Rooms, Cheetah Hurdle, Cheetah Ascending, and two variants with low-resolution image observations. For the Images versions of these environments, we zero-out the x, y coordinates in the observation and append a low-resolution 5 5 3 top-down view of the environment, as described in (Nachum et al., 2019a; Li et al., 2021).
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits (e.g., percentages, sample counts, or specific split files) but describes evaluation frequency during training.
Hardware Specification Yes Experiments are carried out on NVIDIA GTX 2080 Ti GPU.
Software Dependencies No The paper mentions 'Py Torch' but does not specify its version number, nor does it list specific version numbers for other key software libraries or solvers.
Experiment Setup Yes Table 1: Hyper-parameters for experiments lists specific values for numerous parameters including 'Subgoal dimension d 2', 'Radius rg of subgoal selecting neighborhood 20', 'Learning rate for both level policies 0.0002', 'Batch size for both level policies 128', etc.