Approximate Dec-POMDP Solving Using Multi-Agent A*

Authors: Wietze Koops, Sebastian Junges, Nils Jansen

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experiments show competitive performance to the state-of-the-art. Moreover, for multiple benchmarks, we achieve superior performance. In addition, we provide an A algorithm that finds upper bounds for the optimum, tailored towards problems with long horizons. The main ingredient is a new heuristic that periodically reveals the state, thereby limiting the number of reachable beliefs. Our experiments demonstrate the efficacy and scalability of the approach.
Researcher Affiliation Academia Radboud University, Nijmegen, The Netherlands 2Ruhr-University Bochum, Germany
Pseudocode No The paper describes algorithms but does not provide structured pseudocode or algorithm blocks.
Open Source Code Yes Supplementary material and source code are available at https:// arxiv.org/abs/2405.05662 and https://zenodo.org/records/11160648.
Open Datasets Yes We used the standard benchmarks from the literature: DECTIGER [Nair et al., 2003], FIREFIGHTING [Oliehoek et al., 2008b] (3 fire levels, 3 houses), GRID with two observations [Amato et al., 2006], BOXPUSHING [Seuken and Zilberstein, 2007], GRID3X3 [Amato et al., 2009], MARS [Amato and Zilberstein, 2009], HOTEL [Spaan and Melo, 2008], RECYCLING [Amato et al., 2007], and BROADCAST [Hansen et al., 2004].
Dataset Splits No The paper mentions using standard benchmarks but does not specify how the data within these benchmarks was split into training, validation, or test sets, nor does it refer to predefined splits for these benchmarks.
Hardware Specification Yes All experiments ran on a system with an Apple M1 Ultra using the Py Py environment.
Software Dependencies No The paper mentions 'Py Py environment' but does not provide specific version numbers for PyPy or any other software dependencies (libraries, frameworks, etc.) used.
Experiment Setup Yes For PF-MAA , the main hyperparameters are the window size k, which of the heuristics Qmaxr,r or QMDP,r to use, the depth r of the heuristic, and the iteration limit L per stage. We report the configurations that we used in Table 1. Setting r 4 or k 4 is not feasible for all benchmarks. For TR-MAA , we use the heuristic from Section 6 with r = 3 and r = 5 respectively. In each case, the Q3 heuristic is used to solve Dec-POMDPs.