Generating Adjacency-Constrained Subgoals in Hierarchical Reinforcement Learning
Authors: Tianren Zhang, Shangqi Guo, Tian Tan, Xiaolin Hu, Feng Chen
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental results on discrete and continuous control tasks show that incorporating the adjacency constraint improves the performance of state-of-the-art HRL approaches in both deterministic and stochastic environments. |
| Researcher Affiliation | Academia | Tianren Zhang ,1, Shangqi Guo ,1, Tian Tan2, Xiaolin Hu ,3,4,5, Feng Chen ,1,6,7 1 Department of Automation, Tsinghua University 2 Department of Civil and Environmental Engineering, Stanford University 3 Department of Computer Science and Technology, Tsinghua University 4 Beijing National Research Center for Information Science and Technology 5 State Key Laboratory of Intelligent Technology and Systems 6 Beijing Innovation Center for Future Chip 7 LSBDPA Beijing Key Laboratory |
| Pseudocode | No | The paper does not contain structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | 1Code is available at https://github.com/trzhang0116/HRAC. |
| Open Datasets | Yes | Continuous tasks include Ant Gather, Ant Maze and Ant Maze Sparse, where the first two tasks are widely-used benchmarks in HRL community [6, 11, 26, 25, 22], and the third task is a more challenging navigation task with sparse rewards. |
| Dataset Splits | No | The paper describes training and testing procedures but does not explicitly provide specific details on dataset split percentages or sample counts for training, validation, and testing sets. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions the Mu Jo Co simulator but does not provide its version number or any other specific software dependencies with their version numbers required to replicate the experiment. |
| Experiment Setup | Yes | Given the current state s and the subgoal generation frequency k, the high-level only needs to explore in a subset of subgoals covering states that the low-level can possibly reach within k steps. and H(x, k) = max(x/k 1, 0) is a hinge loss function and η is a balancing coefficient. and where gi = ϕ(si), gj = ϕ(sj), and a hyper-parameter δ > 0 is used to create a gap between the embeddings. |