Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Scalable Online Planning for Multi-Agent MDPs

Authors: Shushman Choudhury, Jayesh K. Gupta, Peter Morales, Mykel J. Kochenderfer

JAIR 2022 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our approach on the benchmark Sys Admin domain with static coordination graphs and achieve comparable performance with much lower computation cost than our MCTS baselines. We also introduce a multi-drone delivery domain with dynamic coordination graphs, and demonstrate how our approach scales to large problems on this domain that are intractable for other MCTS methods. [...] We used cumulative discounted return as the primary metric to evaluate our approach, Factored Value MCTS with Max-Plus (FV-MCTS-MP). Our most relevant baseline is Factored Value MCTS with Variable Elimination (FV-MCTS-Var-El). We also compared against standard MCTS (with no factorization), independent Q-learning (IQL), and a random policy.
Researcher Affiliation	Collaboration	Shushman Choudhury EMAIL Stanford University; Jayesh K. Gupta EMAIL Microsoft; Mykel J. Kochenderfer EMAIL Stanford University
Pseudocode	Yes	Algorithm 1 Monte Carlo Tree Search; Algorithm 2 Factored Value MCTS with Max-Plus; Algorithm 3 Max Plus Action Selection
Open Source Code	Yes	We provide an open-source implementation of our algorithm at https://github.com/Julia POMDP/Factored Value MCTS.jl. [...] Source code for experiments is available at https://sites.google.com/stanford.edu/fvmcts/
Open Datasets	No	Our first domain is a standard MMDP benchmark: Sys Admin (Guestrin et al., 2003). [...] We introduce and use a truly distinct domain for our second set of experiments. It simulates a team of delivery drones navigating a shared operation space to reach their assigned goal regions.
Dataset Splits	No	The paper uses simulation environments (Sys Admin, Multi-Drone Delivery) and describes how agents start in randomly sampled cells for the Multi-Drone Delivery domain, rather than using predefined datasets with specified train/test/validation splits.
Hardware Specification	Yes	However, with more agents, standard MCTS runs out of memory even on our 128 GiB RAM machine as expected for large joint action spaces.
Software Dependencies	No	All implementation and simulations are in Julia with the POMDPs.jl library (Bezanson et al., 2017; Egorov et al., 2017)
Experiment Setup	Yes	For the same tree search hyperparameters with number of iterations fixed as 16000, exploration constant as 20 and tree search depth as 20, we compared the average time taken for each action for different number of agents in the coordination graphs. [...] Table 1: Multi-Drone Delivery hyperparameters. (Agents: 8, XY axis res.: 0.20, Noise: 0.10, Expl. const.: 5, Expl. depth: 10, Iterations: 4000; ... and so on for other agent counts).