DACOM: Learning Delay-Aware Communication for Multi-Agent Reinforcement Learning

Authors: Tingting Yuan, Hwei-Ming Chung, Jie Yuan, Xiaoming Fu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental To verify DACOM s effectiveness, we conduct extensive experiments in different environments: particle games, traffic control in autonomous driving, and Star Craft II. We modify the original environments to support delay-aware actions. Furthermore, we apply DACOM and baselines in different communication channels with delays ranging from 10% to 90% of step intervals. Our experiments show that DACOM outperforms other baseline mechanisms.
Researcher Affiliation Collaboration Tingting Yuan1*, Hwei-Ming Chung2,3, Jie Yuan4, Xiaoming Fu1 1 University of G ottingen 2 University of Oslo 3 NOOT Tech. Co., Ltd. 4 Beijing University of Posts and Telecommunications
Pseudocode No The paper describes the model architecture and training process but does not include formal pseudocode or algorithm blocks.
Open Source Code No The paper states: "Code avaialbe at https://github.com/openai/maddpg." which refers to a baseline algorithm (MADDPG), not the authors' own method (DACOM). No other explicit statement or link for DACOM's source code is provided.
Open Datasets Yes The evaluation environments are multi-agent particle games1, autonomous driving2, and Star Craft Multi-Agent Challenge (SMAC)3. 1https://github.com/openai/multiagent-particle-envs. 2https://github.com/eleurent/highway-env. 3https://github.com/oxwhirl/smac.
Dataset Splits No The paper describes the environments (Particle Game, Traffic Control, Star Craft II) and experimental settings, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) for reproduction.
Hardware Specification Yes We train our models on Intel Core i7-8700K CPUs.
Software Dependencies No The paper mentions 'Adam optimizer', 'three-layer multilayer perceptron (MLP)', and 'Re LU as activation functions', but it does not specify concrete software dependencies with version numbers like Python, PyTorch, or TensorFlow versions needed for replication.
Experiment Setup Yes In the experiments, we use an Adam optimizer with a learning rate of 0.005. The discount factor for reward, γ, is 0.95. For the soft update of target networks, we set ξ = 0.01. We use a three-layer multilayer perceptron (MLP) with 64 units for the Encoder and four-layer MLP with 64 units to implement the Time Net, the Actor Net, the Critic Net, and other networks in baselines, such as weight generators of Sched Net and gates in GACML. The neural networks use Re LU as activation functions. We initialize the parameters with random initialization. ... The capacity of the replay buffer is 105, and we take a minibatch of 1024 to update the network parameters.