Unsupervised Skill Discovery with Bottleneck Option Learning

Authors: Jaekyeom Kim, Seohong Park, Gunhee Kim

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We empirically demonstrate that IBOL outperforms multiple state-of-the-art unsupervised skill discovery methods on the information-theoretic evaluations and downstream tasks in Mu Jo Co environments, including Ant, Half Cheetah, Hopper and D Kitty.
Researcher Affiliation Academia 1Department of Computer Science and Engineering, Seoul National University, South Korea.
Pseudocode Yes Algorithm 1 (Phase 1) Training Linearizer; Algorithm 2 (Phase 2) Skill Discovery
Open Source Code Yes Our code is available at https: //vision.snu.ac.kr/projects/ibol.
Open Datasets Yes We experiment with Mu Jo Co environments (Todorov et al., 2012) for multiple tasks: Ant, Half Cheetah, Hopper and Humanoid from Open AI Gym (Brockman et al., 2016) with the setups by Sharma et al. (2020b) and D Kitty from ROBEL (Ahn et al., 2020) adopting the configurations by Sharma et al. (2020a).
Dataset Splits No The paper mentions training and evaluation but does not provide specific percentages or counts for training, validation, or test dataset splits.
Hardware Specification No The paper does not specify any hardware details such as CPU/GPU models or memory.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., PyTorch, Python versions).
Experiment Setup Yes For experiments, we use pre-trained linearizers with two different random seeds on each environment. When training the linearizers, we sample a goal g at the beginning of each roll-out and fix it within that episode to learn consistent behaviors, as in SNN4HRL (Florensa et al., 2016). We consider continuous priors for skill discovery methods. Especially, we use the standard normal distribution for p(u) and r(z) in IBOL and for p(z) in other methods. We set ℓm = 5 for Ant Multi Goals and ℓm = 20 for the others.