Communication in Multi-Agent Reinforcement Learning: Intention Sharing
Authors: Woojun Kim, Jongeui Park, Youngchul Sung
ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Numerical result shows that the proposed IS scheme significantly outperforms other existing communication schemes for MARL including the state-of-the-art algorithms such as ATOC and Tar MAC. In order to evaluate the proposed algorithm and compare it with other communication schemes fairly, we implemented existing baselines on the top of the same MADDPG used for the proposed scheme. Fig. 3 shows the performance of the proposed IS scheme and the considered baselines on the PP, CN, and TJ environments. All performance is averaged over 10 different seeds. |
| Researcher Affiliation | Academia | Woojun Kim, Jongeui Park, Youngchul Sung School of Electrical Engineering, KAIST Daejeon, South Korea {woojun.kim, jongeui.park, ycsung}@kaist.ac.kr |
| Pseudocode | Yes | Algorithm 1 Intention Sharing (IS) Communication Scheme |
| Open Source Code | No | The paper does not provide any concrete access to source code (e.g., specific repository link, explicit code release statement) for the methodology described. |
| Open Datasets | No | The paper describes custom multi-agent environments (Predator-prey, Cooperative-navigation, Traffic-junction) which were modified for the experiments. It does not provide concrete access information (specific link, DOI, repository name, formal citation with authors/year) for any publicly available or open dataset. |
| Dataset Splits | No | The paper describes multi-agent environments and states that 'All performance is averaged over 10 different seeds' but does not specify exact train/validation/test dataset splits with percentages, sample counts, or references to predefined splits. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments. |
| Software Dependencies | No | The paper mentions optimizers like 'ADAM' and architectures like 'deep neural networks' and 'LSTM', but it does not list specific software components with their version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1). |
| Experiment Setup | Yes | Table 2: Hyperparameters of all algorithms - REPLAY BUFFER SIZE 2 × 10^5, DISCOUNT FACTOR 0.99, MINI-BATCH SIZE 128, OPTIMIZER ADAM, LEARNING RATE 0.0005, NUMBER OF HIDDEN LAYERS (ALL NETWORKS) 2, NUMBER OF HIDDEN UNITS PER LAYER 128, ACTIVATION FUNCTION FOR HIDDEN LAYER RELU, IMAGINED TRAJECTORY LENGTH H 5. |