Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts

Authors: Onur Celik, Aleksandar Taranovic, Gerhard Neumann

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show on challenging robot simulation tasks that Di-Skil L can learn diverse and performant skills.
Researcher Affiliation Academia 1Autonomous Learning Robots, Karlsruhe Institute of Technology, Karlsruhe, Germany 2FZI Research Center for Information Technology, Karlsruhe, Germany.
Pseudocode Yes Algorithm 1 Di-Skil L Training and Algorithm 2 Di-Skil L Inference are provided in Appendix C.3.
Open Source Code Yes Videos and code are available on the project webpage: https://alrhub.github.io/di-skill-website/
Open Datasets No The paper uses several simulated robotic environments (Table Tennis, Hopper Jump, Box Pushing, 5-Link Reacher, Robot Mini Golf) but does not provide access information (link, DOI, citation) for publicly available datasets generated from these environments or used as input for training.
Dataset Splits No The paper mentions evaluating methods 'on at least 4 seeds' and reporting '95% stratified bootstrap confidence interval' but does not specify training, validation, and test dataset splits in terms of percentages or absolute counts.
Hardware Specification No The paper mentions 'robot simulation tasks' and acknowledges support from 'bw HPC, as well as the Hore Ka supercomputer' but does not provide specific hardware details such as CPU/GPU models or memory specifications used for running experiments.
Software Dependencies No The paper mentions software components like 'Pro DMPs (Li et al., 2023a)', 'PPO (Schulman et al., 2017)', and 'adam' optimizer, but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup Yes Appendix F contains detailed 'Hyperparameters' tables (Tables 1-7) for all algorithms (Di-Skil L, BBRL, Lin Di-Skil L, PPO) and environments (TT, 5LR, TT-H, HJ, BPO, MG). These tables specify concrete values for critic activation, hidden sizes, learning rates, epochs, batch sizes, alpha, beta, number of components, covariance bounds, mean bounds, and trust region coefficients.