Communication-Efficient Actor-Critic Methods for Homogeneous Markov Games
Authors: Dingyang Chen, Yile Li, Qi Zhang
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our empirical results demonstrate the effectiveness of these innovations when instantiated with a state-of-the-art CTDE algorithm, achieving competitive policy performance with only a fraction of communication during training. and 5 EXPERIMENTS Our experiments aim to answer the following questions in Sections 5.1-5.3, respectively: 1) How communication-efficient is our algorithm proposed in Section 4 against baselines and ablations? 2) How empirically effective is policy consensus? 3) What are the qualitative properties of the learned communication rules? |
| Researcher Affiliation | Academia | Dingyang Chen1, Yile Li, Qi Zhang1 1 Artificial Intelligence Institute, University of South Carolina 1 dingyang@email.sc.edu, qz5@cse.sc.edu |
| Pseudocode | Yes | E.2 PSEUDOCODE Algorithm 1 Pseudocode of our communication-efficient actor-critic algorithm |
| Open Source Code | No | No explicit statement or link indicating the release of open-source code for the methodology described in the paper. |
| Open Datasets | Yes | Environments. We evaluate our algorithm on three tasks in Multi-Agent Particle Environment (MPE) with the efficient implementation by Liu et al. (2020), each of which has a version with N = 15 agents and another with N = 30 agents. As described in Section 3.1, these MPE environments can be cast as homogeneous MGs provided full observability and the permutation preserving observation functions. |
| Dataset Splits | No | No explicit train/validation/test dataset splits are provided, as the experiments are conducted in a simulation environment (MPE) where data is generated during training rather than from a static dataset. |
| Hardware Specification | No | No specific hardware details (e.g., GPU/CPU models, memory specifications) are provided for the experimental setup. |
| Software Dependencies | No | Table 2 mentions optimizers like 'Adam' and components like 'GCN' and 'Gumbel-Softmax', but it does not specify version numbers for any software dependencies (e.g., Python, PyTorch/TensorFlow, specific library versions). |
| Experiment Setup | Yes | E.3 HYPERPARAMETERS Table 2: Hyperparameters lists detailed settings such as 'Episode length 25', 'Number of training episodes 40000', 'Discount factor 0.95', 'Batch size from replay buffer 256', and optimizer learning rates, among others. |