reproducibilityindex.ai

Acquiring Diverse Skills using Curriculum Reinforcement Learning with Mixture of Experts

Authors: Onur Celik, Aleksandar Taranovic, Gerhard Neumann

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show on challenging robot simulation tasks that Di-Skil L can learn diverse and performant skills.
Researcher Affiliation	Academia	1Autonomous Learning Robots, Karlsruhe Institute of Technology, Karlsruhe, Germany 2FZI Research Center for Information Technology, Karlsruhe, Germany.
Pseudocode	Yes	Algorithm 1 Di-Skil L Training and Algorithm 2 Di-Skil L Inference are provided in Appendix C.3.
Open Source Code	Yes	Videos and code are available on the project webpage: https://alrhub.github.io/di-skill-website/
Open Datasets	No	The paper uses several simulated robotic environments (Table Tennis, Hopper Jump, Box Pushing, 5-Link Reacher, Robot Mini Golf) but does not provide access information (link, DOI, citation) for publicly available datasets generated from these environments or used as input for training.
Dataset Splits	No	The paper mentions evaluating methods 'on at least 4 seeds' and reporting '95% stratified bootstrap confidence interval' but does not specify training, validation, and test dataset splits in terms of percentages or absolute counts.
Hardware Specification	No	The paper mentions 'robot simulation tasks' and acknowledges support from 'bw HPC, as well as the Hore Ka supercomputer' but does not provide specific hardware details such as CPU/GPU models or memory specifications used for running experiments.
Software Dependencies	No	The paper mentions software components like 'Pro DMPs (Li et al., 2023a)', 'PPO (Schulman et al., 2017)', and 'adam' optimizer, but does not provide specific version numbers for any software dependencies or libraries.
Experiment Setup	Yes	Appendix F contains detailed 'Hyperparameters' tables (Tables 1-7) for all algorithms (Di-Skil L, BBRL, Lin Di-Skil L, PPO) and environments (TT, 5LR, TT-H, HJ, BPO, MG). These tables specify concrete values for critic activation, hidden sizes, learning rates, epochs, batch sizes, alpha, beta, number of components, covariance bounds, mean bounds, and trust region coefficients.