Variational Curriculum Reinforcement Learning for Unsupervised Discovery of Skills
Authors: Seongun Kim, Kyowoon Lee, Jaesik Choi
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate the effectiveness of our approach on complex navigation and robotic manipulation tasks in terms of sample efficiency and state coverage speed. We also demonstrate that the skills discovered by our method successfully complete a real-world robot navigation task in a zero-shot setup and that incorporating these skills with a global planner further increases the performance. |
| Researcher Affiliation | Academia | 1Kim Jaechul Graduate School of AI, KAIST 2Department of Computer Science and Engineering, UNIST. |
| Pseudocode | No | The paper describes algorithms and methods in text and mathematical formulations but does not include any explicit pseudocode blocks or algorithm listings. |
| Open Source Code | Yes | Codes are available at github.com/seongun-kim/vcrl. |
| Open Datasets | No | The paper mentions various simulated environments and tasks (e.g., 'Point Maze', 'Fetch environments', 'Sawyer environments', 'Husky Navigate simulator'). While some of these are based on existing frameworks, they are referred to as environments for training RL agents rather than standard public datasets with clear download links or citations for data access. For the Husky Navigate simulator, it states: 'Simulator is available at github.com/leekwoon/nav-gym', which is a custom simulator, not a publicly available dataset. |
| Dataset Splits | No | The paper does not explicitly provide specific training/validation/test dataset splits (e.g., percentages or exact counts) for reproduction. Reinforcement learning experiments typically do not use fixed dataset splits in the same way supervised learning does. |
| Hardware Specification | Yes | The training time on a single NVIDIA Quadro 8000 GPU can range from 6 to 30 hours depending on the task and the situation. |
| Software Dependencies | No | The paper mentions software components like SAC, β-VAE, and ROS but does not provide specific version numbers for these or other ancillary software dependencies, which would be necessary for full reproducibility. |
| Experiment Setup | Yes | Appendix E, titled 'Implementation Details and Hyperparameters', includes several tables (Table 3, 4, 5, 6) that list specific hyperparameter values such as 'Discount factor', 'Replay buffer size', 'RL batch size', 'Policy learning rate', 'Q-Function learning rate', 'VAE batch size', and 'β for β-VAE'. |