reproducibilityindex.ai

Cooperative Heterogeneous Deep Reinforcement Learning

Authors: Han Zheng, Pengfei Wei, Jing Jiang, Guodong Long, Qinghua Lu, Chengqi Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Experimental studies on a range of continuous control tasks from the Mujoco benchmark show that CHDRL achieves better performance compared with state-of-the-art baselines. We conducted an empirical evaluation to verify the performance superiority of CSPC to other baselines, and ablation studies to show the effectiveness of each mechanism used in CHDRL.
Researcher Affiliation	Academia	Han Zheng AAII,University of Technology Sydney Han.Zheng-1@student.uts.edu.au Pengfei Wei National University of Singapore wpf89928@gmail.com Jing Jiang AAII,University of Technology Sydney jing.jiang@uts.edu.au Guodong Long AAII,University of Technology Sydney guodong.long@uts.edu.au Qinghua Lu Data61, CSIRO qinghua.lu@data61.csiro.au Chengqi Zhang AAII,University of Technology Sydney Chengqi.Zhang@uts.edu.au
Pseudocode	Yes	Algorithm 1 CSPC; Algorithm 2 TRAIN; Algorithm 3 UPDATE
Open Source Code	No	The paper provides links to third-party codebases (Open AISpinning Up, CEM-RL) that were used, but does not provide concrete access to the source code for the authors' own implemented methodology (CHDRL/CSPC).
Open Datasets	Yes	All the evaluations were done on a continuous control benchmark: Mujoco [30].
Dataset Splits	No	The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) for training, validation, and testing. It discusses time steps of interaction with continuous control environments.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies	No	The paper mentions software used (Open AISpinning Up, CEM-RL) but does not provide specific version numbers for these or other ancillary software components.
Experiment Setup	Yes	For CSPC, we set the gap f as 100, global agent initial learning steps Tg as 5e4, iteration time steps T as 1e4, global memory size Mg as 1e6, local memory size Ml as 2e4, and sample probability from local memory p as 0.3.