Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning

Authors: Tianren Zhang, Shangqi Guo, Tian Tan, Xiaolin Hu, Feng Chen

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental results on discrete and continuous control tasks show that incorporating the adjacency constraint improves the performance of state-of-the-art HRL approaches in both deterministic and stochastic environments.
Researcher Affiliation Academia Tianren Zhang ,1, Shangqi Guo ,1, Tian Tan2, Xiaolin Hu ,3,4,5, Feng Chen ,1,6,7 1 Department of Automation, Tsinghua University 2 Department of Civil and Environmental Engineering, Stanford University 3 Department of Computer Science and Technology, Tsinghua University 4 Beijing National Research Center for Information Science and Technology 5 State Key Laboratory of Intelligent Technology and Systems 6 Beijing Innovation Center for Future Chip 7 LSBDPA Beijing Key Laboratory
Pseudocode No The paper does not contain structured pseudocode or algorithm blocks.
Open Source Code Yes 1Code is available at https://github.com/trzhang0116/HRAC.
Open Datasets Yes Continuous tasks include Ant Gather, Ant Maze and Ant Maze Sparse, where the first two tasks are widely-used benchmarks in HRL community [6, 11, 26, 25, 22], and the third task is a more challenging navigation task with sparse rewards.
Dataset Splits No The paper describes training and testing procedures but does not explicitly provide specific details on dataset split percentages or sample counts for training, validation, and testing sets.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions the Mu Jo Co simulator but does not provide its version number or any other specific software dependencies with their version numbers required to replicate the experiment.
Experiment Setup Yes Given the current state s and the subgoal generation frequency k, the high-level only needs to explore in a subset of subgoals covering states that the low-level can possibly reach within k steps. and H(x, k) = max(x/k 1, 0) is a hinge loss function and η is a balancing coefficient. and where gi = ϕ(si), gj = ϕ(sj), and a hyper-parameter δ > 0 is used to create a gap between the embeddings.