Approximate Dec-POMDP Solving Using Multi-Agent A*
Authors: Wietze Koops, Sebastian Junges, Nils Jansen
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show competitive performance to the state-of-the-art. Moreover, for multiple benchmarks, we achieve superior performance. In addition, we provide an A algorithm that finds upper bounds for the optimum, tailored towards problems with long horizons. The main ingredient is a new heuristic that periodically reveals the state, thereby limiting the number of reachable beliefs. Our experiments demonstrate the efficacy and scalability of the approach. |
| Researcher Affiliation | Academia | Radboud University, Nijmegen, The Netherlands 2Ruhr-University Bochum, Germany |
| Pseudocode | No | The paper describes algorithms but does not provide structured pseudocode or algorithm blocks. |
| Open Source Code | Yes | Supplementary material and source code are available at https:// arxiv.org/abs/2405.05662 and https://zenodo.org/records/11160648. |
| Open Datasets | Yes | We used the standard benchmarks from the literature: DECTIGER [Nair et al., 2003], FIREFIGHTING [Oliehoek et al., 2008b] (3 fire levels, 3 houses), GRID with two observations [Amato et al., 2006], BOXPUSHING [Seuken and Zilberstein, 2007], GRID3X3 [Amato et al., 2009], MARS [Amato and Zilberstein, 2009], HOTEL [Spaan and Melo, 2008], RECYCLING [Amato et al., 2007], and BROADCAST [Hansen et al., 2004]. |
| Dataset Splits | No | The paper mentions using standard benchmarks but does not specify how the data within these benchmarks was split into training, validation, or test sets, nor does it refer to predefined splits for these benchmarks. |
| Hardware Specification | Yes | All experiments ran on a system with an Apple M1 Ultra using the Py Py environment. |
| Software Dependencies | No | The paper mentions 'Py Py environment' but does not provide specific version numbers for PyPy or any other software dependencies (libraries, frameworks, etc.) used. |
| Experiment Setup | Yes | For PF-MAA , the main hyperparameters are the window size k, which of the heuristics Qmaxr,r or QMDP,r to use, the depth r of the heuristic, and the iteration limit L per stage. We report the configurations that we used in Table 1. Setting r 4 or k 4 is not feasible for all benchmarks. For TR-MAA , we use the heuristic from Section 6 with r = 3 and r = 5 respectively. In each case, the Q3 heuristic is used to solve Dec-POMDPs. |