Learning Subgoal Representations with Slow Dynamics
Authors: Siyuan Li, Lulu Zheng, Jianhao Wang, Chongjie Zhang
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments to compare our approach to existing state-of-the-art methods in HRL and in efficient exploration. |
| Researcher Affiliation | Academia | Institute for Interdisciplinary Information Sciences Tsinghua University, Beijing, China |
| Pseudocode | Yes | Algorithm 1 LESSON algorithm |
| Open Source Code | Yes | Find open-source code at https://github.com/Siyuan Lee/LESSON |
| Open Datasets | Yes | We compare LESSON with state-of-theart HRL and exploration methods on complex Mu Jo Co tasks (Todorov et al., 2012). |
| Dataset Splits | No | The paper evaluates performance during training but does not specify a separate validation dataset split or strategy for hyperparameter tuning. |
| Hardware Specification | No | The paper does not provide specific hardware details such as GPU/CPU models, processor types, or memory specifications used for running the experiments. |
| Software Dependencies | No | The paper mentions software like SAC, Adam optimizer, and MuJoCo, but does not provide specific version numbers for any of these components or other libraries. |
| Experiment Setup | Yes | Discount factor γ = 0.99 for both levels. Adam optimizer; learning rate 0.0002. Soft update targets τ = 0.005 for both levels. Replay buffer of size 1e6 for both levels. Reward scaling of 0.1 for both levels. Entropy coefficient of SAC α = 0.2 for both levels. Low-level policy length c = 10 for the Point robot and c = 20 for the Ant robot except for the Ant Push task. In the Ant Push task, c = 50. Subgoal dimension of size 2. |