Communication in Multi-Agent Reinforcement Learning: Intention Sharing

Authors: Woojun Kim, Jongeui Park, Youngchul Sung

ICLR 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Numerical result shows that the proposed IS scheme significantly outperforms other existing communication schemes for MARL including the state-of-the-art algorithms such as ATOC and Tar MAC. In order to evaluate the proposed algorithm and compare it with other communication schemes fairly, we implemented existing baselines on the top of the same MADDPG used for the proposed scheme. Fig. 3 shows the performance of the proposed IS scheme and the considered baselines on the PP, CN, and TJ environments. All performance is averaged over 10 different seeds.
Researcher Affiliation Academia Woojun Kim, Jongeui Park, Youngchul Sung School of Electrical Engineering, KAIST Daejeon, South Korea {woojun.kim, jongeui.park, ycsung}@kaist.ac.kr
Pseudocode Yes Algorithm 1 Intention Sharing (IS) Communication Scheme
Open Source Code No The paper does not provide any concrete access to source code (e.g., specific repository link, explicit code release statement) for the methodology described.
Open Datasets No The paper describes custom multi-agent environments (Predator-prey, Cooperative-navigation, Traffic-junction) which were modified for the experiments. It does not provide concrete access information (specific link, DOI, repository name, formal citation with authors/year) for any publicly available or open dataset.
Dataset Splits No The paper describes multi-agent environments and states that 'All performance is averaged over 10 different seeds' but does not specify exact train/validation/test dataset splits with percentages, sample counts, or references to predefined splits.
Hardware Specification No The paper does not provide specific hardware details (e.g., exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running its experiments.
Software Dependencies No The paper mentions optimizers like 'ADAM' and architectures like 'deep neural networks' and 'LSTM', but it does not list specific software components with their version numbers (e.g., Python 3.8, PyTorch 1.9, CUDA 11.1).
Experiment Setup Yes Table 2: Hyperparameters of all algorithms - REPLAY BUFFER SIZE 2 × 10^5, DISCOUNT FACTOR 0.99, MINI-BATCH SIZE 128, OPTIMIZER ADAM, LEARNING RATE 0.0005, NUMBER OF HIDDEN LAYERS (ALL NETWORKS) 2, NUMBER OF HIDDEN UNITS PER LAYER 128, ACTIVATION FUNCTION FOR HIDDEN LAYER RELU, IMAGINED TRAJECTORY LENGTH H 5.