Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing
Authors: Xiangyu Liu, Kaiqing Zhang
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We hope our study can open up the possibilities of leveraging and even designing different information structures, for developing both sample- and computation-efficient partially observable MARL. ... Finally, we also provide experiments to validate: i) the benefit of information sharing as we considered in partially observable MARL; ii) the implementability of our theoretically supported algorithms. |
| Researcher Affiliation | Academia | 1University of Maryland, College Park. Correspondence to: Kaiqing Zhang <kaiqing@umd.edu>. |
| Pseudocode | Yes | Here we collect both our planning and learning algorithms as in Algorithms 1, 2, 3, 4, 5, 6. For a high-level overview of our algorithmic framework, we refer to Figure 2. ... Algorithm 1 Value iteration with common information ... Algorithm 7 LACI(G,{b Ch}h [H+1],{b φh+1}h [H],Γ ,b L,ϵ,δ2,ζ1,ζ2,θ1,θ2,δ1,N2,ϵe): Learning with Approximate Common Information |
| Open Source Code | No | The paper does not provide a direct link to the source code for the methodology described, nor does it explicitly state that the code is being released. |
| Open Datasets | Yes | We consider the popular deep MARL benchmarks, multi-agent particle-world environment (MPE) (Lowe et al., 2017). ... We compare our approaches with two baselines, FM-E and RNN-E, which are also common information-based approaches in (Mao et al., 2020). The final rewards are reported in Table 1. In both domains with various horizons, our methods consistently outperform the baselines. ... To further validate the tractability of our approaches, we test our learning algorithm on two popular partially observable benchmarks Dectiger (Nair et al., 2003) and Boxpushing (Seuken & Zilberstein, 2012). |
| Dataset Splits | No | The paper mentions training on benchmarks but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts). |
| Hardware Specification | No | The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions using |
| Experiment Setup | Yes | Both our algorithm and baselines are trained with 80000 time steps. ... For the planning oracles used in Algorithm 7, we choose to use Q-learning instead of backward-induction style algorithms as in Algorithm 3, for which we found working very well empirically. Finally, for constructing approximate common information, we used finite memory with a length of 4. |