Unsupervised Skill Discovery via Recurrent Skill Training
Authors: Zheyuan Jiang, Jingyue Gao, Jianyu Chen
NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on a number of challenging 2D navigation environments and robotic locomotion environments. Evaluation results show that our proposed approach outperforms previous parallel training approaches in terms of state coverage and skill diversity. |
| Researcher Affiliation | Academia | Zheyuan Jiang1 Jingyue Gao2 Jianyu Chen1,3 1 Institute for Interdisciplinary Information Sciences, Tsinghua University 2 Department of Computer Science, Tsinghua University 3 Shanghai Qizhi Institute |
| Pseudocode | Yes | Algorithm 1 Recurrent Skill Training |
| Open Source Code | No | The code would be released immediately after acceptance. |
| Open Datasets | Yes | We conduct experiments on several challenging 2D navigation environments and several robotic locomotion tasks... We use the Open AI Gym [16] settings of the three tasks, where Half Cheetah is trained with fixed episode length whereas Hopper and Walker2d terminate when the agents fall during training. |
| Dataset Splits | No | The paper describes collecting 'on-policy samples' for training and using rollouts for evaluation, but does not provide specific training/validation/test dataset splits with percentages or sample counts in the main text. |
| Hardware Specification | No | The paper states that hardware specifications are in the appendix, but no specific hardware details (like GPU/CPU models) are present in the main text provided. |
| Software Dependencies | No | The paper mentions software like 'Mu Jo Co [15]' and 'Open AI Gym [16]' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Algorithm 1 details 'N is the number of skills' and 'M is the number of training epochs for each skill'. In Section 4.1, it specifies 'We train N = 10 skills for each of the algorithms'. The paper also states 'In this paper we choose Proximal Policy Optimization (PPO) [13] with generalized advantage estimation (GAE) [14]'. |