Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning
Authors: Jiaheng Hu, Zizhao Wang, Peter Stone, Roberto Martín-Martín
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Evaluated in a set of challenging environments, DUSDi successfully learns disentangled skills, and significantly outperforms previous skill discovery methods when it comes to applying the learned skills to solve downstream tasks. ... In the evaluation of DUSDi, we aim to answer the following questions: Q1: Are skills learned by DUSDi truly disentangled (Sec. 4.2)? Q2: Can Q-decomposition improve skill learning efficiency (Sec. 4.3)? Q3: Do our disentangled skills perform better when solving downstream tasks compared to other unsupervised reinforcement learning methods (Sec. 4.4)? Q4: Can DUSDi be extended to image observation environments (Sec.4.5)? Q5: Can we leverage the structured skill space of DUSDi to further improve downstream task learning efficiency (Sec.4.6)? |
| Researcher Affiliation | Collaboration | Jiaheng Hu University of Texas at Austin jiahengh@utexas.edu Zizhao Wang University of Texas at Austin zizhao.wang@utexas.edu Peter Stone University of Texas at Austin, Sony AI pstone@cs.utexas.edu Roberto Martín-Martín University of Texas at Austin robertomm@cs.utexas.edu |
| Pseudocode | Yes | We present the entire DUSDi pipeline in Fig. 2, and the pseudo-code in Alg. 1. ... Algorithm 1 DUSDi Skill Learning |
| Open Source Code | Yes | Code and skills visualization at jiahenghu.github.io/DUSDi-site/. |
| Open Datasets | Yes | We test DUSDi on four environments... DMC [48] and Open AI Fetch [4] ... 2D Gunner, Multi-Particle [30], and i Gibson [26]. ... We provide visualizations and additional information about each of the environments in Appendix C. |
| Dataset Splits | No | The paper describes training phases (pretraining and downstream learning) in RL environments, but does not provide explicit training/test/validation dataset splits in the typical sense for a fixed dataset, as data is generated dynamically through interaction. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments with specific models or types (e.g., GPU/CPU models, memory amounts, or cloud instance types). |
| Software Dependencies | No | Table 2: Hyperparameters of Skill Learning. optimizer Adam activation functions Re Lu ... All skill learning methods in our baselines use SAC to optimize for the intrinsic reward... |
| Experiment Setup | Yes | We present the hyperparameters for SAC in Table. 2. All methods use a low-level step size of L = 50. ... Downstream Hierarhical Learning: For all skill discovery methods, downstream learning of the skill selection policy is implemented with PPO. We used the same hyperparameters for all methods across all tasks, as specified in Table. 3. |