reproducibilityindex.ai

Unsupervised Skill Discovery via Recurrent Skill Training

Authors: Zheyuan Jiang, Jingyue Gao, Jianyu Chen

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We conduct experiments on a number of challenging 2D navigation environments and robotic locomotion environments. Evaluation results show that our proposed approach outperforms previous parallel training approaches in terms of state coverage and skill diversity.
Researcher Affiliation	Academia	Zheyuan Jiang1 Jingyue Gao2 Jianyu Chen1,3 1 Institute for Interdisciplinary Information Sciences, Tsinghua University 2 Department of Computer Science, Tsinghua University 3 Shanghai Qizhi Institute
Pseudocode	Yes	Algorithm 1 Recurrent Skill Training
Open Source Code	No	The code would be released immediately after acceptance.
Open Datasets	Yes	We conduct experiments on several challenging 2D navigation environments and several robotic locomotion tasks... We use the Open AI Gym [16] settings of the three tasks, where Half Cheetah is trained with ﬁxed episode length whereas Hopper and Walker2d terminate when the agents fall during training.
Dataset Splits	No	The paper describes collecting 'on-policy samples' for training and using rollouts for evaluation, but does not provide specific training/validation/test dataset splits with percentages or sample counts in the main text.
Hardware Specification	No	The paper states that hardware specifications are in the appendix, but no specific hardware details (like GPU/CPU models) are present in the main text provided.
Software Dependencies	No	The paper mentions software like 'Mu Jo Co [15]' and 'Open AI Gym [16]' but does not provide specific version numbers for these or other software dependencies.
Experiment Setup	Yes	Algorithm 1 details 'N is the number of skills' and 'M is the number of training epochs for each skill'. In Section 4.1, it specifies 'We train N = 10 skills for each of the algorithms'. The paper also states 'In this paper we choose Proximal Policy Optimization (PPO) [13] with generalized advantage estimation (GAE) [14]'.