reproducibilityindex.ai

Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills

Authors: Seongun Kim, Kyowoon Lee, Jaesik Choi

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We validate the effectiveness of our approach on complex navigation and robotic manipulation tasks in terms of sample efficiency and state coverage speed. We also demonstrate that the skills discovered by our method successfully complete a real-world robot navigation task in a zero-shot setup and that incorporating these skills with a global planner further increases the performance.
Researcher Affiliation	Academia	1Kim Jaechul Graduate School of AI, KAIST 2Department of Computer Science and Engineering, UNIST.
Pseudocode	No	The paper describes algorithms and methods in text and mathematical formulations but does not include any explicit pseudocode blocks or algorithm listings.
Open Source Code	Yes	Codes are available at github.com/seongun-kim/vcrl.
Open Datasets	No	The paper mentions various simulated environments and tasks (e.g., 'Point Maze', 'Fetch environments', 'Sawyer environments', 'Husky Navigate simulator'). While some of these are based on existing frameworks, they are referred to as environments for training RL agents rather than standard public datasets with clear download links or citations for data access. For the Husky Navigate simulator, it states: 'Simulator is available at github.com/leekwoon/nav-gym', which is a custom simulator, not a publicly available dataset.
Dataset Splits	No	The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or exact counts) for reproduction. Reinforcement learning experiments typically do not use fixed dataset splits in the same way supervised learning does.
Hardware Specification	Yes	The training time on a single NVIDIA Quadro 8000 GPU can range from 6 to 30 hours depending on the task and the situation.
Software Dependencies	No	The paper mentions software components like SAC, β-VAE, and ROS but does not provide specific version numbers for these or other ancillary software dependencies, which would be necessary for full reproducibility.
Experiment Setup	Yes	Appendix E, titled 'Implementation Details and Hyperparameters', includes several tables (Table 3, 4, 5, 6) that list specific hyperparameter values such as 'Discount factor', 'Replay buffer size', 'RL batch size', 'Policy learning rate', 'Q-Function learning rate', 'VAE batch size', and 'β for β-VAE'.