Reproducibility Index

Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..

Approximate Dec-POMDP Solving Using Multi-Agent A*

Authors: Wietze Koops, Sebastian Junges, Nils Jansen

IJCAI 2024 | Venue PDF | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Our experiments show competitive performance to the state-of-the-art. Moreover, for multiple benchmarks, we achieve superior performance. In addition, we provide an A algorithm that finds upper bounds for the optimum, tailored towards problems with long horizons. The main ingredient is a new heuristic that periodically reveals the state, thereby limiting the number of reachable beliefs. Our experiments demonstrate the efficacy and scalability of the approach.
Researcher Affiliation	Academia	Radboud University, Nijmegen, The Netherlands 2Ruhr-University Bochum, Germany
Pseudocode	No	The paper describes algorithms but does not provide structured pseudocode or algorithm blocks.
Open Source Code	Yes	Supplementary material and source code are available at https:// arxiv.org/abs/2405.05662 and https://zenodo.org/records/11160648.
Open Datasets	Yes	We used the standard benchmarks from the literature: DECTIGER [Nair et al., 2003], FIREFIGHTING [Oliehoek et al., 2008b] (3 fire levels, 3 houses), GRID with two observations [Amato et al., 2006], BOXPUSHING [Seuken and Zilberstein, 2007], GRID3X3 [Amato et al., 2009], MARS [Amato and Zilberstein, 2009], HOTEL [Spaan and Melo, 2008], RECYCLING [Amato et al., 2007], and BROADCAST [Hansen et al., 2004].
Dataset Splits	No	The paper mentions using standard benchmarks but does not specify how the data within these benchmarks was split into training, validation, or test sets, nor does it refer to predefined splits for these benchmarks.
Hardware Specification	Yes	All experiments ran on a system with an Apple M1 Ultra using the Py Py environment.
Software Dependencies	No	The paper mentions 'Py Py environment' but does not provide specific version numbers for PyPy or any other software dependencies (libraries, frameworks, etc.) used.
Experiment Setup	Yes	For PF-MAA , the main hyperparameters are the window size k, which of the heuristics Qmaxr,r or QMDP,r to use, the depth r of the heuristic, and the iteration limit L per stage. We report the configurations that we used in Table 1. Setting r 4 or k 4 is not feasible for all benchmarks. For TR-MAA , we use the heuristic from Section 6 with r = 3 and r = 5 respectively. In each case, the Q3 heuristic is used to solve Dec-POMDPs.