Scalable Safe Policy Improvement for Factored Multi-Agent MDPs

Authors: Federico Bianchi, Edoardo Zorzi, Alberto Castellini, Thiago D. Simão, Matthijs T. J. Spaan, Alessandro Farinelli

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental An empirical evaluation on multi-agent Sys Admin and multi-UAV Delivery shows that the approach scales to very large domains where state-of-the-art methods cannot work.
Researcher Affiliation Academia 1Department of Computer Science, University of Verona, Verona, Italy 2Department of Software Science, Eindhoven University of Technology, Eindhoven, Netherlands 3Department of Intelligent Systems, Delft University of Technology, Delft, Netherlands.
Pseudocode Yes Algorithm 1 Factored-Value MCTS-SPIBB
Open Source Code Yes Code available at https://github.com/Isla-lab/fv-mcts-spibb
Open Datasets Yes Multi-agent Sys Admin is a standard MMDP benchmark (Guestrin et al., 2003). Multi-UAV Delivery was proposed in (Choudhury et al., 2021).
Dataset Splits No The paper does not explicitly provide information about a validation dataset split.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup Yes For FV-MCTS-SPIBB-Max-Plus and FV-MCTS-SPIBB-Var-El, we use the following parameters: 100 simulations, an exploration constant empirically found to be best at c = n. (with n number of agents), MCTS tree depth of 20-steps, γ = 0.9, and 8 iterations of message passing in Max-Plus.