Unsupervised Skill Discovery for Learning Shared Structures across Changing Environments

Authors: Sang-Hyun Lee, Seung-Woo Seo

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results show that our algorithm acquires skills that represent shared structures across changing maze navigation and locomotion environments. Furthermore, we demonstrate that our skills are more useful than baselines on downstream tasks.
Researcher Affiliation Academia 1Department of Electrical and Computer Engineering, Seoul National University, Seoul, South Korea. Correspondence to: Seung Woo Seo <sseo@snu.ac.kr>.
Pseudocode Yes Algorithm 1 describes the overall procedure by which our algorithm learns a new skill policy πm(a|s) in an incremental fashion. ... Algorithm 2. Skill Evaluator(π1:m, ψπ1:m, Hm(S))
Open Source Code No The paper does not provide an explicit statement or link to the open-source code for the described methodology.
Open Datasets Yes maze2d-umaze-continual-v1 is an extension of maze2d-umaze-v1 from D4RL (Fu et al., 2020). ... Half Cheetah-continual-v3 is a variant of Half Cheetah-v3 provided by Open AI Gym (Brockman et al., 2016).
Dataset Splits No The paper does not provide specific training/test/validation dataset splits (percentages or absolute counts) or references to predefined splits needed for reproduction.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory, or cloud instance types) used for running the experiments.
Software Dependencies No The paper mentions software components like 'Soft Actor-Critic (SAC)' and 'Adam optimizer' but does not provide specific version numbers for these or any other software dependencies.
Experiment Setup Yes Table 2 describes the hyperparameters used in our experiments. We used a coarse grid search to tune the hyperparameters (e.g., policy learning rate over 0.0001, 0.0003, and 0.001, mini-batch size for master policy over 32, 64, 128, and 256, and mini-batch size for skill policy over 128, 256, 512, and 1024).