Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Recursive Small-Step Multi-Agent A* for Dec-POMDPs

Authors: Wietze Koops, Nils Jansen, Sebastian Junges, Thiago D. Simão

IJCAI 2023 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	7 Empirical Evaluation This section provides an empirical evaluation of RS-MAA on a set of standard benchmarks, in comparison with GMAA -ICE [Oliehoek et al., 2013], the current state-of-the-art exact solver for Dec-POMDPs. Furthermore, we provide an ablation study, showing the effects of clustering, computing the best response for the last agent in the last stage directly, the choice of heuristic (Qd), and abandoning the computation of the recursive heuristic early based on a maximum number of iterations (M) or a threshold (α).
Researcher Affiliation	Academia	Wietze Koops , Nils Jansen , Sebastian Junges and Thiago D. Sim ao Radboud University, Nijmegen, The Netherlands EMAIL
Pseudocode	No	The paper describes algorithmic concepts and definitions but does not provide structured pseudocode or algorithm blocks, nor does it label any section as 'Pseudocode' or 'Algorithm'.
Open Source Code	Yes	Proofs and source code are given in the supplementary material available at https://zenodo.org/record/7949016
Open Datasets	Yes	We used the standard benchmarks from the literature: DECTIGER [Nair et al., 2003], FIREFIGHTING [Oliehoek et al., 2008] (3 fire levels, 3 or 4 houses), GRID with two observations [Amato et al., 2006], BOXPUSHING [Seuken and Zilberstein, 2007], GRID3X3 [Amato et al., 2009], MARS [Amato and Zilberstein, 2009], HOTEL [Spaan and Melo, 2008], RECYCLING [Amato et al., 2007], and BROADCAST [Hansen et al., 2004].
Dataset Splits	No	The paper evaluates an exact algorithm for Dec-POMDPs on standard benchmarks. These benchmarks define problem instances rather than datasets with typical train/validation/test splits used in supervised learning. Therefore, the paper does not provide information about dataset splits for training, validation, or testing its algorithm.
Hardware Specification	Yes	All experiments were ran on 3.7GHz Intel Core i9 CPU running Ubuntu 22.04.1 LTS. We limited each process to 16GB of memory and 1 hour of CPU time.
Software Dependencies	No	The paper mentions 'Python 3' and 'PyPy' for RS-MAA, and 'C++ implementation in the MADP toolbox [Oliehoek et al., 2017]' for GMAA-ICE. However, it does not provide specific version numbers for PyPy, C++, or the MADP toolbox.
Experiment Setup	Yes	Hyperparameter selection. Our algorithm has three hyperparameters, d, M, α, discussed in Sec. 6. For most problems, Q1 and Q2 are too expensive to compute, but Q3 is a good compromise. For easier problems, Q also works well. In a preliminary evaluation using M {100, 200, 400} and α {0.1, 0.2}, we observed that M = 100 sometimes expands much more nodes than M = 200, but increasing M to 400 did not reduce the number of nodes (significantly) compared to M = 200, and hence typically increased running times. Hence, we set M = 200. The threshold α had a small effect, so we conservatively set it to 0.2.