reproducibilityindex.ai

Multi-Agent MDP Homomorphic Networks

Authors: Elise van der Pol, Herke van Hoof, Frans A Oliehoek, Max Welling

ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We show empirically on symmetric multi-agent problems that globally symmetric distributable policies improve data efﬁciency compared to non-equivariant baselines. We evaluate Multi-Agent MDP Homomorphic Networks on symmetric multi-agent problems and show improved data efﬁciency compared to non-equivariant baselines. Results for this task are shown in Figure 5, with on the y-axis the average return and on the x-axis the number of time steps.
Researcher Affiliation	Academia	Elise van der Pol Uv A-Bosch Deltalab University of Amsterdam e.e.vanderpol@uva.nl Herke van Hoof Uv A-Bosch Deltalab University of Amsterdam h.c.vanhoof@uva.nl Frans A. Oliehoek Department of Intelligent Systems Delft University of Technology f.a.oliehoek@tudelft.nl Max Welling Uv A-Bosch Deltalab University of Amsterdam m.welling@uva.nl
Pseudocode	No	The paper includes 'Listing' blocks (e.g., Listing 1: Equivariant Network Architecture for Centralized Drones) which describe network architectures. However, these are descriptions of model components and their connections, not structured pseudocode for an algorithm or procedure with logical steps.
Open Source Code	Yes	Our code is available at https://github.com/Elisevander Pol/marl_homomorphic_networks.
Open Datasets	No	The paper describes custom simulation environments for "wildlife monitoring" and "trafﬁc light control" (
Dataset Splits	No	The paper describes training duration in terms of time steps (e.g., "We train for 500k time steps") but does not specify explicit train/validation/test dataset splits in terms of data partitioning. Evaluation is conducted by observing performance on the simulated environments over training time.
Hardware Specification	No	The paper does not provide specific details about the hardware used for running experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies	Yes	Numpy (Harris et al., 2020) 1.19.2: BSD 3-Clause "New" or "Revised" License; Py Torch (Paszke et al., 2017) 1.2.0: Modiﬁed BSD license; RLPYT (Stooke & Abbeel, 2019): MIT license; MDP Homomorphic Networks & Symmetrizer (van der Pol et al., 2020): MIT license.
Experiment Setup	Yes	For all approaches, including baselines, we run at least 15 random seeds for 6 different learning rates, {0.001, 0.003, 0.0001, 0.0003, 0.00001, 0.00003}, and report the best learning rate for each. Other hyperparameters are taken as default in the codebase (Stooke & Abbeel, 2019; van der Pol et al., 2020). We train in a centralized fashion, with PPO (Schulman et al., 2017). See Table 1 for best learning rates. Architectures are given below and were chosen to be as similar as possible between different approaches, keeping the number of trainable parameters comparable between approaches.