Unsupervised Skill Discovery via Recurrent Skill Training

Authors: Zheyuan Jiang, Jingyue Gao, Jianyu Chen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments on a number of challenging 2D navigation environments and robotic locomotion environments. Evaluation results show that our proposed approach outperforms previous parallel training approaches in terms of state coverage and skill diversity.
Researcher Affiliation Academia Zheyuan Jiang1 Jingyue Gao2 Jianyu Chen1,3 1 Institute for Interdisciplinary Information Sciences, Tsinghua University 2 Department of Computer Science, Tsinghua University 3 Shanghai Qizhi Institute
Pseudocode Yes Algorithm 1 Recurrent Skill Training
Open Source Code No The code would be released immediately after acceptance.
Open Datasets Yes We conduct experiments on several challenging 2D navigation environments and several robotic locomotion tasks... We use the Open AI Gym [16] settings of the three tasks, where Half Cheetah is trained with fixed episode length whereas Hopper and Walker2d terminate when the agents fall during training.
Dataset Splits No The paper describes collecting 'on-policy samples' for training and using rollouts for evaluation, but does not provide specific training/validation/test dataset splits with percentages or sample counts in the main text.
Hardware Specification No The paper states that hardware specifications are in the appendix, but no specific hardware details (like GPU/CPU models) are present in the main text provided.
Software Dependencies No The paper mentions software like 'Mu Jo Co [15]' and 'Open AI Gym [16]' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup Yes Algorithm 1 details 'N is the number of skills' and 'M is the number of training epochs for each skill'. In Section 4.1, it specifies 'We train N = 10 skills for each of the algorithms'. The paper also states 'In this paper we choose Proximal Policy Optimization (PPO) [13] with generalized advantage estimation (GAE) [14]'.