Multi-Agent MDP Homomorphic Networks

Authors: Elise van der Pol, Herke van Hoof, Frans A Oliehoek, Max Welling

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We show empirically on symmetric multi-agent problems that globally symmetric distributable policies improve data efficiency compared to non-equivariant baselines. We evaluate Multi-Agent MDP Homomorphic Networks on symmetric multi-agent problems and show improved data efficiency compared to non-equivariant baselines. Results for this task are shown in Figure 5, with on the y-axis the average return and on the x-axis the number of time steps.
Researcher Affiliation Academia Elise van der Pol Uv A-Bosch Deltalab University of Amsterdam e.e.vanderpol@uva.nl Herke van Hoof Uv A-Bosch Deltalab University of Amsterdam h.c.vanhoof@uva.nl Frans A. Oliehoek Department of Intelligent Systems Delft University of Technology f.a.oliehoek@tudelft.nl Max Welling Uv A-Bosch Deltalab University of Amsterdam m.welling@uva.nl
Pseudocode No The paper includes 'Listing' blocks (e.g., Listing 1: Equivariant Network Architecture for Centralized Drones) which describe network architectures. However, these are descriptions of model components and their connections, not structured pseudocode for an algorithm or procedure with logical steps.
Open Source Code Yes Our code is available at https://github.com/Elisevander Pol/marl_homomorphic_networks.
Open Datasets No The paper describes custom simulation environments for "wildlife monitoring" and "traffic light control" (
Dataset Splits No The paper describes training duration in terms of time steps (e.g., "We train for 500k time steps") but does not specify explicit train/validation/test dataset splits in terms of data partitioning. Evaluation is conducted by observing performance on the simulated environments over training time.
Hardware Specification No The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies Yes Numpy (Harris et al., 2020) 1.19.2: BSD 3-Clause "New" or "Revised" License; Py Torch (Paszke et al., 2017) 1.2.0: Modified BSD license; RLPYT (Stooke & Abbeel, 2019): MIT license; MDP Homomorphic Networks & Symmetrizer (van der Pol et al., 2020): MIT license.
Experiment Setup Yes For all approaches, including baselines, we run at least 15 random seeds for 6 different learning rates, {0.001, 0.003, 0.0001, 0.0003, 0.00001, 0.00003}, and report the best learning rate for each. Other hyperparameters are taken as default in the codebase (Stooke & Abbeel, 2019; van der Pol et al., 2020). We train in a centralized fashion, with PPO (Schulman et al., 2017). See Table 1 for best learning rates. Architectures are given below and were chosen to be as similar as possible between different approaches, keeping the number of trainable parameters comparable between approaches.