reproducibilityindex.ai

Active Hierarchical Exploration with Stable Subgoal Representation Learning

Authors: Siyuan Li, Jin Zhang, Jianhao Wang, Yang Yu, Chongjie Zhang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental results show that our approach significantly outperforms state-of-the-art baselines in continuous control tasks with sparse rewards. We compare the proposed method HESS with state-of-the-art baselines in a number of difficult control tasks with sparse rewards. Note that the environment setting in this paper is much more challenging to exploration than the multi-task and deceptive dense-reward ones in the baselines. Experimental results demonstrate that HESS significantly outperforms existing baselines. In addition, we perform multiple ablations illustrating the importance of the various components of HESS.
Researcher Affiliation	Academia	Siyuan Li , Jin Zhang , Jianhao Wang , Yang Yu , Chongjie Zhang Tsinghua University, Nanjing University {sy-li17,jin-zhan20,wjh19}@mails.tsinghua.edu.cn {yuy}@lamda.nju.edu.cn, chongjie@tsinghua.edu.cn
Pseudocode	Yes	Algorithm 1 HESS algorithm
Open Source Code	Yes	Code is available at https://github.com/SiyuanLee/HESS. For reproducibility, we include an anonymous downloadable source code in the supplementary material.
Open Datasets	Yes	We evaluate on a suite of Mu Jo Co (Todorov et al., 2012) tasks that are widely used in the HRL community, including Ant (or Point) Maze, Ant Push, Ant Four Rooms, Cheetah Hurdle, Cheetah Ascending, and two variants with low-resolution image observations. For the Images versions of these environments, we zero-out the x, y coordinates in the observation and append a low-resolution 5 5 3 top-down view of the environment, as described in (Nachum et al., 2019a; Li et al., 2021).
Dataset Splits	No	The paper does not explicitly provide training/test/validation dataset splits (e.g., percentages, sample counts, or specific split files) but describes evaluation frequency during training.
Hardware Specification	Yes	Experiments are carried out on NVIDIA GTX 2080 Ti GPU.
Software Dependencies	No	The paper mentions 'Py Torch' but does not specify its version number, nor does it list specific version numbers for other key software libraries or solvers.
Experiment Setup	Yes	Table 1: Hyper-parameters for experiments lists specific values for numerous parameters including 'Subgoal dimension d 2', 'Radius rg of subgoal selecting neighborhood 20', 'Learning rate for both level policies 0.0002', 'Batch size for both level policies 128', etc.