Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts
Authors: Onur Celik, Aleksandar Taranovic, Gerhard Neumann
ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show on challenging robot simulation tasks that Di-Skil L can learn diverse and performant skills. |
| Researcher Affiliation | Academia | 1Autonomous Learning Robots, Karlsruhe Institute of Technology, Karlsruhe, Germany 2FZI Research Center for Information Technology, Karlsruhe, Germany. |
| Pseudocode | Yes | Algorithm 1 Di-Skil L Training and Algorithm 2 Di-Skil L Inference are provided in Appendix C.3. |
| Open Source Code | Yes | Videos and code are available on the project webpage: https://alrhub.github.io/di-skill-website/ |
| Open Datasets | No | The paper uses several simulated robotic environments (Table Tennis, Hopper Jump, Box Pushing, 5-Link Reacher, Robot Mini Golf) but does not provide access information (link, DOI, citation) for publicly available datasets generated from these environments or used as input for training. |
| Dataset Splits | No | The paper mentions evaluating methods 'on at least 4 seeds' and reporting '95% stratified bootstrap confidence interval' but does not specify training, validation, and test dataset splits in terms of percentages or absolute counts. |
| Hardware Specification | No | The paper mentions 'robot simulation tasks' and acknowledges support from 'bw HPC, as well as the Hore Ka supercomputer' but does not provide specific hardware details such as CPU/GPU models or memory specifications used for running experiments. |
| Software Dependencies | No | The paper mentions software components like 'Pro DMPs (Li et al., 2023a)', 'PPO (Schulman et al., 2017)', and 'adam' optimizer, but does not provide specific version numbers for any software dependencies or libraries. |
| Experiment Setup | Yes | Appendix F contains detailed 'Hyperparameters' tables (Tables 1-7) for all algorithms (Di-Skil L, BBRL, Lin Di-Skil L, PPO) and environments (TT, 5LR, TT-H, HJ, BPO, MG). These tables specify concrete values for critic activation, hidden sizes, learning rates, epochs, batch sizes, alpha, beta, number of components, covariance bounds, mean bounds, and trust region coefficients. |