Notice: The reproducibility variables underlying each score are classified using an automated LLM-based pipeline, validated against a manually labeled dataset. LLM-based classification introduces uncertainty and potential bias; scores should be interpreted as estimates. Full accuracy metrics and methodology are described in Coakley et alK. L. Coakley, T. Snelleman, H. Hoos, and O. E. Gundersen, "The embrace of open science: An analysis of a decade of AI research and 56 800 conference papers," Under Review, 2026..
Factored Online Planning in Many-Agent POMDPs
Authors: Maris F. L. Galesloot, Thiago D. Simão, Sebastian Junges, Nils Jansen
AAAI 2024 | Venue PDF | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experimental evaluation against several state-ofthe-art baselines shows that our methods (1) are competitive in settings with only a few agents and (2) improve over the baselines in the presence of many agents. |
| Researcher Affiliation | Academia | 1Radboud University Nijmegen, The Netherlands 2Eindhoven University of Technology, The Netherlands 3Ruhr-University Bochum, Germany |
| Pseudocode | Yes | We provide the pseudo-code for the SIR filter that updates b in Appendix F.2. |
| Open Source Code | Yes | All algorithm variants are implemented in the same Python prototype, published online1. 1https://zenodo.org/records/10409525. |
| Open Datasets | Yes | Benchmarks. FIREFIGHTINGGRAPH (FFG, Oliehoek et al. 2008) has been used to evaluate factored POMCP (Amato and Oliehoek 2015). Agents stand in a line, and houses are located to the left and right of each agent. Multi-agent ROCKSAMPLE (MARS, Cai et al. 2021) extends single-agent Rock Sample (Smith and Simmons 2004). MARS environments are defined by their size m, the number of agents n, and the number of rocks k, with k = m = 15. In CAPTURETARGET (CT), agents are tasked with capturing a moving target. We depict results for CT in Appendix A.2. Detailed benchmark descriptions are in Appendix G. |
| Dataset Splits | No | The paper describes simulations and episodes for evaluation (“averaged over 100 episodes”), but it does not specify traditional train/validation/test dataset splits as it’s an online planning paper rather than one using static datasets. |
| Hardware Specification | Yes | All code ran on a machine with an Intel(R) Core(TM) i910980XE CPU @ 3.00GHz and 256 GB RAM (8 x 32GB DDR4-3200). |
| Software Dependencies | No | The paper mentions “All algorithm variants are implemented in the same Python prototype,” but it does not provide specific version numbers for Python or any associated libraries. |
| Experiment Setup | Yes | We did not run an extensive hyperparameter optimization for any algorithm, and we list the most important parameters in Tab. 2 of Appendix A.1. All algorithms ran with a maximum of 5s and 15s per step on FFG/CT and MARS, respectively. If the particle filter belief is deprived at any point in time during the episode, the policy defaults to a random policy. We set the number K of particles in the joint filters such that K = P e Ke in the factored filters, e.g., if we have three edges with Ke = 100, then the joint counterpart has K = 300. |