Collaborative Evolutionary Reinforcement Learning

Authors: Shauharda Khadka, Somdeb Majumdar, Tarek Nassar, Zach Dwiel, Evren Tumer, Santiago Miret, Yinyin Liu, Kagan Tumer

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experiments in a range of continuous control benchmarks demonstrate that the emergent learner significantly outperforms its composite learners while remaining overall more sample-efficient notably solving the Mujoco Humanoid benchmark where all of its composite learners (TD3) fail entirely in isolation. Domain: CERL is evaluated on 5 continuous control tasks on Mujoco (Todorov et al., 2012).
Researcher Affiliation Collaboration 1Intel AI Lab 2Collaborative Robotics and Intelligent Systems Institute, Oregon State University. Correspondence to: Shauharda Khadka <shauharda.khadka@intel.com>, Somdeb Majumdar <somdeb.majumdar@intel.com>.
Pseudocode Yes Algorithm 2, 3 and 4 provide a detailed pseudo-code of the CERL algorithm using a portfolio of TD3 learners.
Open Source Code Yes Additionally, our source code 1 is available online. [1github.com/intelai/cerl]
Open Datasets Yes Domain: CERL is evaluated on 5 continuous control tasks on Mujoco (Todorov et al., 2012). These benchmarks are used widely in the field (Khadka & Tumer, 2018; Such et al., 2017; Schulman et al., 2017) and are hosted on Open AI gym (Brockman et al., 2016).
Dataset Splits No The paper does not provide specific train/validation/test dataset split information (percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning for evaluation. While it mentions TD3's use of target networks for stability, this is not a dataset validation split.
Hardware Specification No The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. It only mentions GPUs and CPUs in general terms for parallelization.
Software Dependencies No The paper mentions software like "Open AI gym" and "Mujoco" but does not provide specific version numbers for these or any other key software components or libraries.
Experiment Setup Yes The 4 TD3 learners are identical with each other apart from their discount rates which are 0.9, 0.99, 0.997, and 0.9995. The computational budget of b workers was set to 10 to match the evolutionary population size. The UCB exploration coefficient was set to 0.9.