Collaborative Evolutionary Reinforcement Learning
Authors: Shauharda Khadka, Somdeb Majumdar, Tarek Nassar, Zach Dwiel, Evren Tumer, Santiago Miret, Yinyin Liu, Kagan Tumer
ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experiments in a range of continuous control benchmarks demonstrate that the emergent learner significantly outperforms its composite learners while remaining overall more sample-efficient notably solving the Mujoco Humanoid benchmark where all of its composite learners (TD3) fail entirely in isolation. Domain: CERL is evaluated on 5 continuous control tasks on Mujoco (Todorov et al., 2012). |
| Researcher Affiliation | Collaboration | 1Intel AI Lab 2Collaborative Robotics and Intelligent Systems Institute, Oregon State University. Correspondence to: Shauharda Khadka <shauharda.khadka@intel.com>, Somdeb Majumdar <somdeb.majumdar@intel.com>. |
| Pseudocode | Yes | Algorithm 2, 3 and 4 provide a detailed pseudo-code of the CERL algorithm using a portfolio of TD3 learners. |
| Open Source Code | Yes | Additionally, our source code 1 is available online. [1github.com/intelai/cerl] |
| Open Datasets | Yes | Domain: CERL is evaluated on 5 continuous control tasks on Mujoco (Todorov et al., 2012). These benchmarks are used widely in the field (Khadka & Tumer, 2018; Such et al., 2017; Schulman et al., 2017) and are hosted on Open AI gym (Brockman et al., 2016). |
| Dataset Splits | No | The paper does not provide specific train/validation/test dataset split information (percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning for evaluation. While it mentions TD3's use of target networks for stability, this is not a dataset validation split. |
| Hardware Specification | No | The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. It only mentions GPUs and CPUs in general terms for parallelization. |
| Software Dependencies | No | The paper mentions software like "Open AI gym" and "Mujoco" but does not provide specific version numbers for these or any other key software components or libraries. |
| Experiment Setup | Yes | The 4 TD3 learners are identical with each other apart from their discount rates which are 0.9, 0.99, 0.997, and 0.9995. The computational budget of b workers was set to 10 to match the evolutionary population size. The UCB exploration coefficient was set to 0.9. |