Cooperative Heterogeneous Deep Reinforcement Learning

Authors: Han Zheng, Pengfei Wei, Jing Jiang, Guodong Long, Qinghua Lu, Chengqi Zhang

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Experimental studies on a range of continuous control tasks from the Mujoco benchmark show that CHDRL achieves better performance compared with state-of-the-art baselines. We conducted an empirical evaluation to verify the performance superiority of CSPC to other baselines, and ablation studies to show the effectiveness of each mechanism used in CHDRL.
Researcher Affiliation Academia Han Zheng AAII,University of Technology Sydney Han.Zheng-1@student.uts.edu.au Pengfei Wei National University of Singapore wpf89928@gmail.com Jing Jiang AAII,University of Technology Sydney jing.jiang@uts.edu.au Guodong Long AAII,University of Technology Sydney guodong.long@uts.edu.au Qinghua Lu Data61, CSIRO qinghua.lu@data61.csiro.au Chengqi Zhang AAII,University of Technology Sydney Chengqi.Zhang@uts.edu.au
Pseudocode Yes Algorithm 1 CSPC; Algorithm 2 TRAIN; Algorithm 3 UPDATE
Open Source Code No The paper provides links to third-party codebases (Open AISpinning Up, CEM-RL) that were used, but does not provide concrete access to the source code for the authors' own implemented methodology (CHDRL/CSPC).
Open Datasets Yes All the evaluations were done on a continuous control benchmark: Mujoco [30].
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) for training, validation, and testing. It discusses time steps of interaction with continuous control environments.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions software used (Open AISpinning Up, CEM-RL) but does not provide specific version numbers for these or other ancillary software components.
Experiment Setup Yes For CSPC, we set the gap f as 100, global agent initial learning steps Tg as 5e4, iteration time steps T as 1e4, global memory size Mg as 1e6, local memory size Ml as 2e4, and sample probability from local memory p as 0.3.