Multi-Agent MDP Homomorphic Networks
Authors: Elise van der Pol, Herke van Hoof, Frans A Oliehoek, Max Welling
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We show empirically on symmetric multi-agent problems that globally symmetric distributable policies improve data efficiency compared to non-equivariant baselines. We evaluate Multi-Agent MDP Homomorphic Networks on symmetric multi-agent problems and show improved data efficiency compared to non-equivariant baselines. Results for this task are shown in Figure 5, with on the y-axis the average return and on the x-axis the number of time steps. |
| Researcher Affiliation | Academia | Elise van der Pol Uv A-Bosch Deltalab University of Amsterdam e.e.vanderpol@uva.nl Herke van Hoof Uv A-Bosch Deltalab University of Amsterdam h.c.vanhoof@uva.nl Frans A. Oliehoek Department of Intelligent Systems Delft University of Technology f.a.oliehoek@tudelft.nl Max Welling Uv A-Bosch Deltalab University of Amsterdam m.welling@uva.nl |
| Pseudocode | No | The paper includes 'Listing' blocks (e.g., Listing 1: Equivariant Network Architecture for Centralized Drones) which describe network architectures. However, these are descriptions of model components and their connections, not structured pseudocode for an algorithm or procedure with logical steps. |
| Open Source Code | Yes | Our code is available at https://github.com/Elisevander Pol/marl_homomorphic_networks. |
| Open Datasets | No | The paper describes custom simulation environments for "wildlife monitoring" and "traffic light control" ( |
| Dataset Splits | No | The paper describes training duration in terms of time steps (e.g., "We train for 500k time steps") but does not specify explicit train/validation/test dataset splits in terms of data partitioning. Evaluation is conducted by observing performance on the simulated environments over training time. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | Yes | Numpy (Harris et al., 2020) 1.19.2: BSD 3-Clause "New" or "Revised" License; Py Torch (Paszke et al., 2017) 1.2.0: Modified BSD license; RLPYT (Stooke & Abbeel, 2019): MIT license; MDP Homomorphic Networks & Symmetrizer (van der Pol et al., 2020): MIT license. |
| Experiment Setup | Yes | For all approaches, including baselines, we run at least 15 random seeds for 6 different learning rates, {0.001, 0.003, 0.0001, 0.0003, 0.00001, 0.00003}, and report the best learning rate for each. Other hyperparameters are taken as default in the codebase (Stooke & Abbeel, 2019; van der Pol et al., 2020). We train in a centralized fashion, with PPO (Schulman et al., 2017). See Table 1 for best learning rates. Architectures are given below and were chosen to be as similar as possible between different approaches, keeping the number of trainable parameters comparable between approaches. |