Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Unsupervised Skill Discovery via Recurrent Skill Training
Authors: Zheyuan Jiang, Jingyue Gao, Jianyu Chen
NeurIPS 2022 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We conduct experiments on a number of challenging 2D navigation environments and robotic locomotion environments. Evaluation results show that our proposed approach outperforms previous parallel training approaches in terms of state coverage and skill diversity. |
| Researcher Affiliation | Academia | Zheyuan Jiang1 Jingyue Gao2 Jianyu Chen1,3 1 Institute for Interdisciplinary Information Sciences, Tsinghua University 2 Department of Computer Science, Tsinghua University 3 Shanghai Qizhi Institute |
| Pseudocode | Yes | Algorithm 1 Recurrent Skill Training |
| Open Source Code | No | The code would be released immediately after acceptance. |
| Open Datasets | Yes | We conduct experiments on several challenging 2D navigation environments and several robotic locomotion tasks... We use the Open AI Gym [16] settings of the three tasks, where Half Cheetah is trained with fixed episode length whereas Hopper and Walker2d terminate when the agents fall during training. |
| Dataset Splits | No | The paper describes collecting 'on-policy samples' for training and using rollouts for evaluation, but does not provide specific training/validation/test dataset splits with percentages or sample counts in the main text. |
| Hardware Specification | No | The paper states that hardware specifications are in the appendix, but no specific hardware details (like GPU/CPU models) are present in the main text provided. |
| Software Dependencies | No | The paper mentions software like 'Mu Jo Co [15]' and 'Open AI Gym [16]' but does not provide specific version numbers for these or other software dependencies. |
| Experiment Setup | Yes | Algorithm 1 details 'N is the number of skills' and 'M is the number of training epochs for each skill'. In Section 4.1, it specifies 'We train N = 10 skills for each of the algorithms'. The paper also states 'In this paper we choose Proximal Policy Optimization (PPO) [13] with generalized advantage estimation (GAE) [14]'. |