Learning Subgoal Representations with Slow Dynamics

Authors: Siyuan Li, Lulu Zheng, Jianhao Wang, Chongjie Zhang

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments to compare our approach to existing state-of-the-art methods in HRL and in efficient exploration.
Researcher Affiliation Academia Institute for Interdisciplinary Information Sciences Tsinghua University, Beijing, China
Pseudocode Yes Algorithm 1 LESSON algorithm
Open Source Code Yes Find open-source code at https://github.com/Siyuan Lee/LESSON
Open Datasets Yes We compare LESSON with state-of-theart HRL and exploration methods on complex Mu Jo Co tasks (Todorov et al., 2012).
Dataset Splits No The paper evaluates performance during training but does not specify a separate validation dataset split or strategy for hyperparameter tuning.
Hardware Specification No The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory specifications used for running the experiments.
Software Dependencies No The paper mentions software like SAC, Adam optimizer, and MuJoCo, but does not provide specific version numbers for any of these components or other libraries.
Experiment Setup Yes Discount factor γ = 0.99 for both levels. Adam optimizer; learning rate 0.0002. Soft update targets τ = 0.005 for both levels. Replay buffer of size 1e6 for both levels. Reward scaling of 0.1 for both levels. Entropy coefficient of SAC α = 0.2 for both levels. Low-level policy length c = 10 for the Point robot and c = 20 for the Ant robot except for the Ant Push task. In the Ant Push task, c = 50. Subgoal dimension of size 2.