Communication-Efficient Actor-Critic Methods for Homogeneous Markov Games

Authors: Dingyang Chen, Yile Li, Qi Zhang

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our empirical results demonstrate the effectiveness of these innovations when instantiated with a state-of-the-art CTDE algorithm, achieving competitive policy performance with only a fraction of communication during training. and 5 EXPERIMENTS Our experiments aim to answer the following questions in Sections 5.1-5.3, respectively: 1) How communication-efficient is our algorithm proposed in Section 4 against baselines and ablations? 2) How empirically effective is policy consensus? 3) What are the qualitative properties of the learned communication rules?
Researcher Affiliation Academia Dingyang Chen1, Yile Li, Qi Zhang1 1 Artificial Intelligence Institute, University of South Carolina 1 dingyang@email.sc.edu, qz5@cse.sc.edu
Pseudocode Yes E.2 PSEUDOCODE Algorithm 1 Pseudocode of our communication-efficient actor-critic algorithm
Open Source Code No No explicit statement or link indicating the release of open-source code for the methodology described in the paper.
Open Datasets Yes Environments. We evaluate our algorithm on three tasks in Multi-Agent Particle Environment (MPE) with the efficient implementation by Liu et al. (2020), each of which has a version with N = 15 agents and another with N = 30 agents. As described in Section 3.1, these MPE environments can be cast as homogeneous MGs provided full observability and the permutation preserving observation functions.
Dataset Splits No No explicit train/validation/test dataset splits are provided, as the experiments are conducted in a simulation environment (MPE) where data is generated during training rather than from a static dataset.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory specifications) are provided for the experimental setup.
Software Dependencies No Table 2 mentions optimizers like 'Adam' and components like 'GCN' and 'Gumbel-Softmax', but it does not specify version numbers for any software dependencies (e.g., Python, PyTorch/TensorFlow, specific library versions).
Experiment Setup Yes E.3 HYPERPARAMETERS Table 2: Hyperparameters lists detailed settings such as 'Episode length 25', 'Number of training episodes 40000', 'Discount factor 0.95', 'Batch size from replay buffer 256', and optimizer learning rates, among others.