STAR: Spatio-Temporal State Compression for Multi-Agent Tasks with Rich Observations
Authors: Chao Li, Yujing Hu, Shangdong Yang, Tangjie Lv, Changjie Fan, Wenbin Li, Chongjie Zhang, Yang Gao
IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We evaluate our method by developing decentralized policies on 12 maps of the Star Craft Multi-Agent Challenge benchmark, and the superior performance demonstrates its effectiveness. |
| Researcher Affiliation | Collaboration | Chao Li1 , Yujing Hu2 , Shangdong Yang3,1 , Tangjie Lv2 , Changjie Fan2 , Wenbin Li1 , Chongjie Zhang4 and Yang Gao1 1State Key Laboratory for Novel Software Technology, Nanjing University 2Net Ease Fuxi AI Lab 3School of Computer Science, Nanjing University of Posts and Telecommunications 4Department of Computer Science & Engineering, Washington University in St. Louis |
| Pseudocode | No | The paper does not contain any clearly labeled pseudocode or algorithm blocks. |
| Open Source Code | No | The paper does not provide an explicit statement about releasing source code or a link to a code repository. |
| Open Datasets | Yes | We evaluate our method on 12 maps of the Star Craft Multi-Agent Challenge (SMAC) [Samvelyan et al., 2019]. |
| Dataset Splits | No | The paper uses maps from the SMAC benchmark and evaluates performance (win rate) but does not specify explicit train/validation/test dataset splits like percentages or absolute counts. It implicitly uses the SMAC benchmark's evaluation methodology. |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments (e.g., CPU, GPU models, memory). |
| Software Dependencies | No | The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions). |
| Experiment Setup | Yes | Specifically, we instantiate STAR by following the same paradigm as MAPPO, where agents decentralized policies undergo updates based on advantage functions derived from the state value function. Furthermore, we maximize the representation entropy (H(Xi) in Eq. (2)) by proposing a combined value function that guides agents policies, defined as follows: V i = αVl(oi) + (1 α)V (xi), where V i represents the ultimate value function for agent i, and Vl(oi) denotes a local value function conditioned on the local observation oi besides V (xi). α is a diminishing factor that progressively reduces the influence of Vl(oi) on per-agent policy optimization. |