Disentangled Unsupervised Skill Discovery for Efficient Hierarchical Reinforcement Learning

Authors: Jiaheng Hu, Zizhao Wang, Peter Stone, Roberto Martín-Martín

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Evaluated in a set of challenging environments, DUSDi successfully learns disentangled skills, and significantly outperforms previous skill discovery methods when it comes to applying the learned skills to solve downstream tasks. ... In the evaluation of DUSDi, we aim to answer the following questions: Q1: Are skills learned by DUSDi truly disentangled (Sec. 4.2)? Q2: Can Q-decomposition improve skill learning efficiency (Sec. 4.3)? Q3: Do our disentangled skills perform better when solving downstream tasks compared to other unsupervised reinforcement learning methods (Sec. 4.4)? Q4: Can DUSDi be extended to image observation environments (Sec.4.5)? Q5: Can we leverage the structured skill space of DUSDi to further improve downstream task learning efficiency (Sec.4.6)?
Researcher Affiliation Collaboration Jiaheng Hu University of Texas at Austin jiahengh@utexas.edu Zizhao Wang University of Texas at Austin zizhao.wang@utexas.edu Peter Stone University of Texas at Austin, Sony AI pstone@cs.utexas.edu Roberto Martín-Martín University of Texas at Austin robertomm@cs.utexas.edu
Pseudocode Yes We present the entire DUSDi pipeline in Fig. 2, and the pseudo-code in Alg. 1. ... Algorithm 1 DUSDi Skill Learning
Open Source Code Yes Code and skills visualization at jiahenghu.github.io/DUSDi-site/.
Open Datasets Yes We test DUSDi on four environments... DMC [48] and Open AI Fetch [4] ... 2D Gunner, Multi-Particle [30], and i Gibson [26]. ... We provide visualizations and additional information about each of the environments in Appendix C.
Dataset Splits No The paper describes training phases (pretraining and downstream learning) in RL environments, but does not provide explicit training/test/validation dataset splits in the typical sense for a fixed dataset, as data is generated dynamically through interaction.
Hardware Specification No The paper does not explicitly describe the hardware used to run its experiments with specific models or types (e.g., GPU/CPU models, memory amounts, or cloud instance types).
Software Dependencies No Table 2: Hyperparameters of Skill Learning. optimizer Adam activation functions Re Lu ... All skill learning methods in our baselines use SAC to optimize for the intrinsic reward...
Experiment Setup Yes We present the hyperparameters for SAC in Table. 2. All methods use a low-level step size of L = 50. ... Downstream Hierarhical Learning: For all skill discovery methods, downstream learning of the skill selection policy is implemented with PPO. We used the same hyperparameters for all methods across all tasks, as specified in Table. 3.