Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in [1].

Scalable Online Planning for Multi-Agent MDPs

Authors: Shushman Choudhury, Jayesh K. Gupta, Peter Morales, Mykel J. Kochenderfer

JAIR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate our approach on the benchmark Sys Admin domain with static coordination graphs and achieve comparable performance with much lower computation cost than our MCTS baselines. We also introduce a multi-drone delivery domain with dynamic coordination graphs, and demonstrate how our approach scales to large problems on this domain that are intractable for other MCTS methods. [...] We used cumulative discounted return as the primary metric to evaluate our approach, Factored Value MCTS with Max-Plus (FV-MCTS-MP). Our most relevant baseline is Factored Value MCTS with Variable Elimination (FV-MCTS-Var-El). We also compared against standard MCTS (with no factorization), independent Q-learning (IQL), and a random policy.
Researcher Affiliation Collaboration Shushman Choudhury EMAIL Stanford University; Jayesh K. Gupta EMAIL Microsoft; Mykel J. Kochenderfer EMAIL Stanford University
Pseudocode Yes Algorithm 1 Monte Carlo Tree Search; Algorithm 2 Factored Value MCTS with Max-Plus; Algorithm 3 Max Plus Action Selection
Open Source Code Yes We provide an open-source implementation of our algorithm at https://github.com/Julia POMDP/Factored Value MCTS.jl. [...] Source code for experiments is available at https://sites.google.com/stanford.edu/fvmcts/
Open Datasets No Our first domain is a standard MMDP benchmark: Sys Admin (Guestrin et al., 2003). [...] We introduce and use a truly distinct domain for our second set of experiments. It simulates a team of delivery drones navigating a shared operation space to reach their assigned goal regions.
Dataset Splits No The paper uses simulation environments (Sys Admin, Multi-Drone Delivery) and describes how agents start in randomly sampled cells for the Multi-Drone Delivery domain, rather than using predefined datasets with specified train/test/validation splits.
Hardware Specification Yes However, with more agents, standard MCTS runs out of memory even on our 128 GiB RAM machine as expected for large joint action spaces.
Software Dependencies No All implementation and simulations are in Julia with the POMDPs.jl library (Bezanson et al., 2017; Egorov et al., 2017)
Experiment Setup Yes For the same tree search hyperparameters with number of iterations fixed as 16000, exploration constant as 20 and tree search depth as 20, we compared the average time taken for each action for different number of agents in the coordination graphs. [...] Table 1: Multi-Drone Delivery hyperparameters. (Agents: 8, XY axis res.: 0.20, Noise: 0.10, Expl. const.: 5, Expl. depth: 10, Iterations: 4000; ... and so on for other agent counts).