Scalable Planning and Learning for Multiagent POMDPs
Authors: Christopher Amato, Frans Oliehoek
AAAI 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Here, we empirically investigate the effectiveness of our factorization methods by comparing them to non-factored methods in the planning and learning settings. Experimental results show that we are able to provide high quality solutions to large multiagent planning and learning problems. |
| Researcher Affiliation | Academia | Christopher Amato CSAIL, MIT Cambridge, MA 02139 camato@csail.mit.edu Frans A. Oliehoek Informatics Institute, University of Amsterdam Dept. of CS, University of Liverpool frans.oliehoek@liverpool.ac.uk |
| Pseudocode | No | The paper describes the algorithms conceptually and references modifications to existing functions but does not provide any formal pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not contain any explicit statements about releasing code or links to a code repository. |
| Open Datasets | No | The paper describes custom problem settings ('firefighting problems', 'sensor network problems') used in the experiments but does not provide access information (links, citations) for publicly available datasets. |
| Dataset Splits | No | The paper mentions 'Each experiment was run for a given number of simulations, the number of samples used at each step to choose an action, and averaged over a number of episodes.' but does not specify any training/validation/test dataset splits. |
| Hardware Specification | Yes | Experiments were run on a single core of a 2.5 GHz machine with 8GB of memory. |
| Software Dependencies | No | The paper mentions comparing 'factored representations to the flat version using POMCP' and using 'the same code base', but it does not specify any software names with version numbers. |
| Experiment Setup | Yes | Each experiment was run for a given number of simulations, the number of samples used at each step to choose an action, and averaged over a number of episodes. We report undiscounted return with the standard error. For the BA-MPOMDPs, H = 10, 50. |