$\rm E(3)$-Equivariant Actor-Critic Methods for Cooperative Multi-Agent Reinforcement Learning

Authors: Dingyang Chen, Qi Zhang

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental As a result, our method achieves superior sample efficiency and generalization performance in a range of benchmark MARL tasks that exhibit continuous E(3)-symmetries but were not accommodated by prior work. and 6. Experiments Environments. We choose the popular cooperative MARL benchmarks of MPE, Mu Jo Co continuous control tasks (Mu Jo Co tasks), including the 2D ones from Tassa et al. (2018) and 3D ones from Chen et al. (2023) with singleand multi-agent variations, and Star Craft Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019) to evaluate the effectiveness of our E(3)-equivariant multi-agent actor-critic methods described in Section 5.
Researcher Affiliation Academia Artificial Intelligence Institute, University of South Carolina, Columbia, SC, USA. Correspondence to: Dingyang Chen <dingyang@email.sc.edu>, Qi Zhang <qz5@cse.sc.edu>.
Pseudocode No The paper describes its methods and architectures using text and diagrams (e.g., Figure 3), but it does not include any explicitly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes Our code is publicly available at https://github.com/dchen48/E3AC.
Open Datasets Yes Environments. We choose the popular cooperative MARL benchmarks of MPE, Mu Jo Co continuous control tasks (Mu Jo Co tasks), including the 2D ones from Tassa et al. (2018) and 3D ones from Chen et al. (2023) with singleand multi-agent variations, and Star Craft Multi-Agent Challenge (SMAC) (Samvelyan et al., 2019) to evaluate the effectiveness of our E(3)-equivariant multi-agent actor-critic methods described in Section 5.
Dataset Splits No The paper describes training and evaluation metrics (e.g., 'Number of training episodes', '#episodes per evaluation' in the hyperparameters tables) typical for reinforcement learning, but it does not specify explicit train/validation/test dataset splits with percentages or sample counts for data used in a supervised learning context.
Hardware Specification Yes The code is implemented by Py Torch, and runs on NVIDIA Tesla V100 GPUs with 32 CPU cores.
Software Dependencies No The paper mentions 'The code is implemented by Py Torch' but does not provide specific version numbers for PyTorch or any other software libraries or dependencies.
Experiment Setup Yes Table 1: Hyperparameters for MPE and Batch size from replay buffer for [MLP, MLP] and [GCN, MLP] 1024 Actor s learning rate for [MLP, MLP] and [GCN, MLP] 1e-4 Critic s learning rate for [MLP, MLP] and [GCN, MLP] 1e-3