Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scalable Safe Policy Improvement for Factored Multi-Agent MDPs

Authors: Federico Bianchi, Edoardo Zorzi, Alberto Castellini, Thiago D. Simão, Matthijs T. J. Spaan, Alessandro Farinelli

ICML 2024 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental An empirical evaluation on multi-agent Sys Admin and multi-UAV Delivery shows that the approach scales to very large domains where state-of-the-art methods cannot work.
Researcher Affiliation Academia 1Department of Computer Science, University of Verona, Verona, Italy 2Department of Software Science, Eindhoven University of Technology, Eindhoven, Netherlands 3Department of Intelligent Systems, Delft University of Technology, Delft, Netherlands.
Pseudocode Yes Algorithm 1 Factored-Value MCTS-SPIBB
Open Source Code Yes Code available at https://github.com/Isla-lab/fv-mcts-spibb
Open Datasets Yes Multi-agent Sys Admin is a standard MMDP benchmark (Guestrin et al., 2003). Multi-UAV Delivery was proposed in (Choudhury et al., 2021).
Dataset Splits No The paper does not explicitly provide information about a validation dataset split.
Hardware Specification No The paper does not provide specific hardware details (e.g., GPU/CPU models, memory specifications) used for running the experiments.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python 3.8, PyTorch 1.9).
Experiment Setup Yes For FV-MCTS-SPIBB-Max-Plus and FV-MCTS-SPIBB-Var-El, we use the following parameters: 100 simulations, an exploration constant empirically found to be best at c = n. (with n number of agents), MCTS tree depth of 20-steps, γ = 0.9, and 8 iterations of message passing in Max-Plus.