Active Hierarchical Exploration with Stable Subgoal Representation Learning
Authors: Siyuan Li, Jin Zhang, Jianhao Wang, Yang Yu, Chongjie Zhang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results show that our approach significantly outperforms state-of-the-art baselines in continuous control tasks with sparse rewards. We compare the proposed method HESS with state-of-the-art baselines in a number of difficult control tasks with sparse rewards. Note that the environment setting in this paper is much more challenging to exploration than the multi-task and deceptive dense-reward ones in the baselines. Experimental results demonstrate that HESS significantly outperforms existing baselines. In addition, we perform multiple ablations illustrating the importance of the various components of HESS. |
| Researcher Affiliation | Academia | Siyuan Li , Jin Zhang , Jianhao Wang , Yang Yu , Chongjie Zhang Tsinghua University, Nanjing University {sy-li17,jin-zhan20,wjh19}@mails.tsinghua.edu.cn {yuy}@lamda.nju.edu.cn, chongjie@tsinghua.edu.cn |
| Pseudocode | Yes | Algorithm 1 HESS algorithm |
| Open Source Code | Yes | Code is available at https://github.com/SiyuanLee/HESS. For reproducibility, we include an anonymous downloadable source code in the supplementary material. |
| Open Datasets | Yes | We evaluate on a suite of Mu Jo Co (Todorov et al., 2012) tasks that are widely used in the HRL community, including Ant (or Point) Maze, Ant Push, Ant Four Rooms, Cheetah Hurdle, Cheetah Ascending, and two variants with low-resolution image observations. For the Images versions of these environments, we zero-out the x, y coordinates in the observation and append a low-resolution 5 5 3 top-down view of the environment, as described in (Nachum et al., 2019a; Li et al., 2021). |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits (e.g., percentages, sample counts, or specific split files) but describes evaluation frequency during training. |
| Hardware Specification | Yes | Experiments are carried out on NVIDIA GTX 2080 Ti GPU. |
| Software Dependencies | No | The paper mentions 'Py Torch' but does not specify its version number, nor does it list specific version numbers for other key software libraries or solvers. |
| Experiment Setup | Yes | Table 1: Hyper-parameters for experiments lists specific values for numerous parameters including 'Subgoal dimension d 2', 'Radius rg of subgoal selecting neighborhood 20', 'Learning rate for both level policies 0.0002', 'Batch size for both level policies 128', etc. |