reproducibilityindex.ai

Collaborative Evolutionary Reinforcement Learning

Authors: Shauharda Khadka, Somdeb Majumdar, Tarek Nassar, Zach Dwiel, Evren Tumer, Santiago Miret, Yinyin Liu, Kagan Tumer

ICML 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experiments in a range of continuous control benchmarks demonstrate that the emergent learner signiﬁcantly outperforms its composite learners while remaining overall more sample-efﬁcient notably solving the Mujoco Humanoid benchmark where all of its composite learners (TD3) fail entirely in isolation. Domain: CERL is evaluated on 5 continuous control tasks on Mujoco (Todorov et al., 2012).
Researcher Affiliation	Collaboration	1Intel AI Lab 2Collaborative Robotics and Intelligent Systems Institute, Oregon State University. Correspondence to: Shauharda Khadka <shauharda.khadka@intel.com>, Somdeb Majumdar <somdeb.majumdar@intel.com>.
Pseudocode	Yes	Algorithm 2, 3 and 4 provide a detailed pseudo-code of the CERL algorithm using a portfolio of TD3 learners.
Open Source Code	Yes	Additionally, our source code 1 is available online. [1github.com/intelai/cerl]
Open Datasets	Yes	Domain: CERL is evaluated on 5 continuous control tasks on Mujoco (Todorov et al., 2012). These benchmarks are used widely in the ﬁeld (Khadka & Tumer, 2018; Such et al., 2017; Schulman et al., 2017) and are hosted on Open AI gym (Brockman et al., 2016).
Dataset Splits	No	The paper does not provide specific train/validation/test dataset split information (percentages, sample counts, or citations to predefined splits) needed to reproduce the data partitioning for evaluation. While it mentions TD3's use of target networks for stability, this is not a dataset validation split.
Hardware Specification	No	The paper does not provide specific hardware details (exact GPU/CPU models, processor types, or memory amounts) used for running its experiments. It only mentions GPUs and CPUs in general terms for parallelization.
Software Dependencies	No	The paper mentions software like "Open AI gym" and "Mujoco" but does not provide specific version numbers for these or any other key software components or libraries.
Experiment Setup	Yes	The 4 TD3 learners are identical with each other apart from their discount rates which are 0.9, 0.99, 0.997, and 0.9995. The computational budget of b workers was set to 10 to match the evolutionary population size. The UCB exploration coefﬁcient was set to 0.9.