reproducibilityindex.ai

DACOM: Learning Delay-Aware Communication for Multi-Agent Reinforcement Learning

Authors: Tingting Yuan, Hwei-Ming Chung, Jie Yuan, Xiaoming Fu

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	To verify DACOM s effectiveness, we conduct extensive experiments in different environments: particle games, trafﬁc control in autonomous driving, and Star Craft II. We modify the original environments to support delay-aware actions. Furthermore, we apply DACOM and baselines in different communication channels with delays ranging from 10% to 90% of step intervals. Our experiments show that DACOM outperforms other baseline mechanisms.
Researcher Affiliation	Collaboration	Tingting Yuan1*, Hwei-Ming Chung2,3, Jie Yuan4, Xiaoming Fu1 1 University of G ottingen 2 University of Oslo 3 NOOT Tech. Co., Ltd. 4 Beijing University of Posts and Telecommunications
Pseudocode	No	The paper describes the model architecture and training process but does not include formal pseudocode or algorithm blocks.
Open Source Code	No	The paper states: "Code avaialbe at https://github.com/openai/maddpg." which refers to a baseline algorithm (MADDPG), not the authors' own method (DACOM). No other explicit statement or link for DACOM's source code is provided.
Open Datasets	Yes	The evaluation environments are multi-agent particle games1, autonomous driving2, and Star Craft Multi-Agent Challenge (SMAC)3. 1https://github.com/openai/multiagent-particle-envs. 2https://github.com/eleurent/highway-env. 3https://github.com/oxwhirl/smac.
Dataset Splits	No	The paper describes the environments (Particle Game, Traffic Control, Star Craft II) and experimental settings, but it does not specify explicit training, validation, or test dataset splits (e.g., percentages, sample counts, or citations to predefined splits) for reproduction.
Hardware Specification	Yes	We train our models on Intel Core i7-8700K CPUs.
Software Dependencies	No	The paper mentions 'Adam optimizer', 'three-layer multilayer perceptron (MLP)', and 'Re LU as activation functions', but it does not specify concrete software dependencies with version numbers like Python, PyTorch, or TensorFlow versions needed for replication.
Experiment Setup	Yes	In the experiments, we use an Adam optimizer with a learning rate of 0.005. The discount factor for reward, γ, is 0.95. For the soft update of target networks, we set ξ = 0.01. We use a three-layer multilayer perceptron (MLP) with 64 units for the Encoder and four-layer MLP with 64 units to implement the Time Net, the Actor Net, the Critic Net, and other networks in baselines, such as weight generators of Sched Net and gates in GACML. The neural networks use Re LU as activation functions. We initialize the parameters with random initialization. ... The capacity of the replay buffer is 105, and we take a minibatch of 1024 to update the network parameters.