Multi-agent active perception with prediction rewards

Authors: Mikko Lauri, Frans Oliehoek

NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We demonstrate the empirical usefulness of our results by applying a standard Dec-POMDP algorithm to multi-agent active perception problems, showing increased scalability in the planning horizon.
Researcher Affiliation Academia Mikko Lauri Department of Computer Science Universität Hamburg Hamburg, Germany lauri@informatik.uni-hamburg.de Frans A. Oliehoek Department of Computer Science TU Delft Delft, the Netherlands f.a.oliehoek@tudelft.nl
Pseudocode Yes The pseudocode for APAS is shown in Algorithm 1.
Open Source Code Yes A reference implementation is available at https://github.com/laurimi/multiagent-prediction-reward.
Open Datasets No The paper mentions evaluating on "the Dec-ρPOMDP domains from [11]: the micro air vehicle (MAV) domain and information gathering rovers domain." However, it does not provide concrete access information (e.g., URL, DOI, or a formal citation for the dataset itself) for these domains, which are described as problem setups rather than downloadable datasets.
Dataset Splits No The paper mentions running "100 repetitions" and reporting "average policy value and its standard error" but does not specify any training, validation, or test dataset splits. The problem domains are typically reinforcement learning environments, not datasets with explicit splits.
Hardware Specification No The paper does not provide any specific hardware details such as GPU models, CPU types, or memory specifications used for running the experiments.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python version, specific library versions, or solver versions).
Experiment Setup Yes For the MAV problem, we use K = 2, and for the rovers problem K = 5 individual prediction actions. For APAS, we initialize Γ by randomly sampling linearization points bk (S), k = 1, . . . , K using [23]. The corresponding α-vectors for the negative entropy are αk(s) = ln bk(s). We run 100 repetitions using APAS and report the average policy value and its standard error.