Cooperative Heterogeneous Deep Reinforcement Learning
Authors: Han Zheng, Pengfei Wei, Jing Jiang, Guodong Long, Qinghua Lu, Chengqi Zhang
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Experimental studies on a range of continuous control tasks from the Mujoco benchmark show that CHDRL achieves better performance compared with state-of-the-art baselines. We conducted an empirical evaluation to verify the performance superiority of CSPC to other baselines, and ablation studies to show the effectiveness of each mechanism used in CHDRL. |
| Researcher Affiliation | Academia | Han Zheng AAII,University of Technology Sydney Han.Zheng-1@student.uts.edu.au Pengfei Wei National University of Singapore wpf89928@gmail.com Jing Jiang AAII,University of Technology Sydney jing.jiang@uts.edu.au Guodong Long AAII,University of Technology Sydney guodong.long@uts.edu.au Qinghua Lu Data61, CSIRO qinghua.lu@data61.csiro.au Chengqi Zhang AAII,University of Technology Sydney Chengqi.Zhang@uts.edu.au |
| Pseudocode | Yes | Algorithm 1 CSPC; Algorithm 2 TRAIN; Algorithm 3 UPDATE |
| Open Source Code | No | The paper provides links to third-party codebases (Open AISpinning Up, CEM-RL) that were used, but does not provide concrete access to the source code for the authors' own implemented methodology (CHDRL/CSPC). |
| Open Datasets | Yes | All the evaluations were done on a continuous control benchmark: Mujoco [30]. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, or citations to predefined splits) for training, validation, and testing. It discusses time steps of interaction with continuous control environments. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., GPU/CPU models, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions software used (Open AISpinning Up, CEM-RL) but does not provide specific version numbers for these or other ancillary software components. |
| Experiment Setup | Yes | For CSPC, we set the gap f as 100, global agent initial learning steps Tg as 5e4, iteration time steps T as 1e4, global memory size Mg as 1e6, local memory size Ml as 2e4, and sample probability from local memory p as 0.3. |