reproducibilityindex.ai

Unsupervised Skill Discovery with Bottleneck Option Learning

Authors: Jaekyeom Kim, Seohong Park, Gunhee Kim

ICML 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We empirically demonstrate that IBOL outperforms multiple state-of-the-art unsupervised skill discovery methods on the information-theoretic evaluations and downstream tasks in Mu Jo Co environments, including Ant, Half Cheetah, Hopper and D Kitty.
Researcher Affiliation	Academia	1Department of Computer Science and Engineering, Seoul National University, South Korea.
Pseudocode	Yes	Algorithm 1 (Phase 1) Training Linearizer; Algorithm 2 (Phase 2) Skill Discovery
Open Source Code	Yes	Our code is available at https: //vision.snu.ac.kr/projects/ibol.
Open Datasets	Yes	We experiment with Mu Jo Co environments (Todorov et al., 2012) for multiple tasks: Ant, Half Cheetah, Hopper and Humanoid from Open AI Gym (Brockman et al., 2016) with the setups by Sharma et al. (2020b) and D Kitty from ROBEL (Ahn et al., 2020) adopting the conﬁgurations by Sharma et al. (2020a).
Dataset Splits	No	The paper mentions training and evaluation but does not provide specific percentages or counts for training, validation, or test dataset splits.
Hardware Specification	No	The paper does not specify any hardware details such as CPU/GPU models or memory.
Software Dependencies	No	The paper does not provide specific software dependencies with version numbers (e.g., PyTorch, Python versions).
Experiment Setup	Yes	For experiments, we use pre-trained linearizers with two different random seeds on each environment. When training the linearizers, we sample a goal g at the beginning of each roll-out and ﬁx it within that episode to learn consistent behaviors, as in SNN4HRL (Florensa et al., 2016). We consider continuous priors for skill discovery methods. Especially, we use the standard normal distribution for p(u) and r(z) in IBOL and for p(z) in other methods. We set ℓm = 5 for Ant Multi Goals and ℓm = 20 for the others.