Unsupervised Skill Discovery with Bottleneck Option Learning
Authors: Jaekyeom Kim, Seohong Park, Gunhee Kim
ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We empirically demonstrate that IBOL outperforms multiple state-of-the-art unsupervised skill discovery methods on the information-theoretic evaluations and downstream tasks in Mu Jo Co environments, including Ant, Half Cheetah, Hopper and D Kitty. |
| Researcher Affiliation | Academia | 1Department of Computer Science and Engineering, Seoul National University, South Korea. |
| Pseudocode | Yes | Algorithm 1 (Phase 1) Training Linearizer; Algorithm 2 (Phase 2) Skill Discovery |
| Open Source Code | Yes | Our code is available at https: //vision.snu.ac.kr/projects/ibol. |
| Open Datasets | Yes | We experiment with Mu Jo Co environments (Todorov et al., 2012) for multiple tasks: Ant, Half Cheetah, Hopper and Humanoid from Open AI Gym (Brockman et al., 2016) with the setups by Sharma et al. (2020b) and D Kitty from ROBEL (Ahn et al., 2020) adopting the configurations by Sharma et al. (2020a). |
| Dataset Splits | No | The paper mentions training and evaluation but does not provide specific percentages or counts for training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not specify any hardware details such as CPU/GPU models or memory. |
| Software Dependencies | No | The paper does not provide specific software dependencies with version numbers (e.g., PyTorch, Python versions). |
| Experiment Setup | Yes | For experiments, we use pre-trained linearizers with two different random seeds on each environment. When training the linearizers, we sample a goal g at the beginning of each roll-out and fix it within that episode to learn consistent behaviors, as in SNN4HRL (Florensa et al., 2016). We consider continuous priors for skill discovery methods. Especially, we use the standard normal distribution for p(u) and r(z) in IBOL and for p(z) in other methods. We set ℓm = 5 for Ant Multi Goals and ℓm = 20 for the others. |