Factored Online Planning in Many-Agent POMDPs

Authors: Maris F. L. Galesloot, Thiago D. Simão, Sebastian Junges, Nils Jansen

AAAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental evaluation against several state-ofthe-art baselines shows that our methods (1) are competitive in settings with only a few agents and (2) improve over the baselines in the presence of many agents.
Researcher Affiliation Academia 1Radboud University Nijmegen, The Netherlands 2Eindhoven University of Technology, The Netherlands 3Ruhr-University Bochum, Germany
Pseudocode Yes We provide the pseudo-code for the SIR filter that updates b in Appendix F.2.
Open Source Code Yes All algorithm variants are implemented in the same Python prototype, published online1. 1https://zenodo.org/records/10409525.
Open Datasets Yes Benchmarks. FIREFIGHTINGGRAPH (FFG, Oliehoek et al. 2008) has been used to evaluate factored POMCP (Amato and Oliehoek 2015). Agents stand in a line, and houses are located to the left and right of each agent. Multi-agent ROCKSAMPLE (MARS, Cai et al. 2021) extends single-agent Rock Sample (Smith and Simmons 2004). MARS environments are defined by their size m, the number of agents n, and the number of rocks k, with k = m = 15. In CAPTURETARGET (CT), agents are tasked with capturing a moving target. We depict results for CT in Appendix A.2. Detailed benchmark descriptions are in Appendix G.
Dataset Splits No The paper describes simulations and episodes for evaluation (“averaged over 100 episodes”), but it does not specify traditional train/validation/test dataset splits as it’s an online planning paper rather than one using static datasets.
Hardware Specification Yes All code ran on a machine with an Intel(R) Core(TM) i910980XE CPU @ 3.00GHz and 256 GB RAM (8 x 32GB DDR4-3200).
Software Dependencies No The paper mentions “All algorithm variants are implemented in the same Python prototype,” but it does not provide specific version numbers for Python or any associated libraries.
Experiment Setup Yes We did not run an extensive hyperparameter optimization for any algorithm, and we list the most important parameters in Tab. 2 of Appendix A.1. All algorithms ran with a maximum of 5s and 15s per step on FFG/CT and MARS, respectively. If the particle filter belief is deprived at any point in time during the episode, the policy defaults to a random policy. We set the number K of particles in the joint filters such that K = P e Ke in the factored filters, e.g., if we have three edges with Ke = 100, then the joint counterpart has K = 300.