reproducibilityindex.ai

Partially Observable Multi-agent RL with (Quasi-)Efficiency: The Blessing of Information Sharing

Authors: Xiangyu Liu, Kaiqing Zhang

ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We hope our study can open up the possibilities of leveraging and even designing different information structures, for developing both sample- and computation-efficient partially observable MARL. ... Finally, we also provide experiments to validate: i) the benefit of information sharing as we considered in partially observable MARL; ii) the implementability of our theoretically supported algorithms.
Researcher Affiliation	Academia	1University of Maryland, College Park. Correspondence to: Kaiqing Zhang <kaiqing@umd.edu>.
Pseudocode	Yes	Here we collect both our planning and learning algorithms as in Algorithms 1, 2, 3, 4, 5, 6. For a high-level overview of our algorithmic framework, we refer to Figure 2. ... Algorithm 1 Value iteration with common information ... Algorithm 7 LACI(G,{b Ch}h [H+1],{b φh+1}h [H],Γ ,b L,ϵ,δ2,ζ1,ζ2,θ1,θ2,δ1,N2,ϵe): Learning with Approximate Common Information
Open Source Code	No	The paper does not provide a direct link to the source code for the methodology described, nor does it explicitly state that the code is being released.
Open Datasets	Yes	We consider the popular deep MARL benchmarks, multi-agent particle-world environment (MPE) (Lowe et al., 2017). ... We compare our approaches with two baselines, FM-E and RNN-E, which are also common information-based approaches in (Mao et al., 2020). The final rewards are reported in Table 1. In both domains with various horizons, our methods consistently outperform the baselines. ... To further validate the tractability of our approaches, we test our learning algorithm on two popular partially observable benchmarks Dectiger (Nair et al., 2003) and Boxpushing (Seuken & Zilberstein, 2012).
Dataset Splits	No	The paper mentions training on benchmarks but does not specify explicit training, validation, or test dataset splits (e.g., percentages or sample counts).
Hardware Specification	No	The paper does not provide specific details about the hardware used to run the experiments, such as GPU or CPU models.
Software Dependencies	No	The paper mentions using
Experiment Setup	Yes	Both our algorithm and baselines are trained with 80000 time steps. ... For the planning oracles used in Algorithm 7, we choose to use Q-learning instead of backward-induction style algorithms as in Algorithm 3, for which we found working very well empirically. Finally, for constructing approximate common information, we used finite memory with a length of 4.