Settling Decentralized Multi-Agent Coordinated Exploration by Novelty Sharing
Authors: Haobin Jiang, Ziluo Ding, Zongqing Lu
AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Empirically, we show that MACE achieves superior performance in three multi-agent environments with sparse rewards. |
| Researcher Affiliation | Academia | Haobin Jiang1, Ziluo Ding1,2, Zongqing Lu1 1School of Computer Science, Peking University 2Beijing Academy of Artificial Intelligence {haobin.jiang, ziluo, zongqing.lu}@pku.edu.cn |
| Pseudocode | No | The paper does not contain a clearly labeled pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide an explicit statement or a link regarding the availability of its source code. |
| Open Datasets | No | The paper states 'We design three tasks in Grid World', implying a custom environment not explicitly made publicly available. While Overcooked and SMAC are cited and are known public environments, the inclusion of a custom Grid World environment for which no access information is provided means not all data used is publicly available. |
| Dataset Splits | No | The paper mentions 'Each curve shows the mean reward of several runs with different random seeds (5 runs in Pass, 8 runs in Secret Room and Multi Room) and shaded regions indicate standard error,' but does not specify train/validation/test dataset splits. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments (e.g., GPU model, CPU model, memory). |
| Software Dependencies | No | The paper mentions 'we implement PPO leveraging GRU' and uses RND, but it does not provide specific version numbers for software libraries, frameworks, or languages used. |
| Experiment Setup | No | The paper mentions that 'λ is a hyperparameter' but does not provide its specific value, nor other detailed hyperparameters (e.g., learning rate, batch size) or system-level training settings. |