reproducibilityindex.ai

STAR: Spatio-Temporal State Compression for Multi-Agent Tasks with Rich Observations

Authors: Chao Li, Yujing Hu, Shangdong Yang, Tangjie Lv, Changjie Fan, Wenbin Li, Chongjie Zhang, Yang Gao

IJCAI 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate our method by developing decentralized policies on 12 maps of the Star Craft Multi-Agent Challenge benchmark, and the superior performance demonstrates its effectiveness.
Researcher Affiliation	Collaboration	Chao Li1 , Yujing Hu2 , Shangdong Yang3,1 , Tangjie Lv2 , Changjie Fan2 , Wenbin Li1 , Chongjie Zhang4 and Yang Gao1 1State Key Laboratory for Novel Software Technology, Nanjing University 2Net Ease Fuxi AI Lab 3School of Computer Science, Nanjing University of Posts and Telecommunications 4Department of Computer Science & Engineering, Washington University in St. Louis
Pseudocode	No	The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code	No	The paper does not provide an explicit statement about releasing source code or a link to a code repository.
Open Datasets	Yes	We evaluate our method on 12 maps of the Star Craft Multi-Agent Challenge (SMAC) [Samvelyan et al., 2019].
Dataset Splits	No	The paper uses maps from the SMAC benchmark and evaluates performance (win rate) but does not specify explicit train/validation/test dataset splits like percentages or absolute counts. It implicitly uses the SMAC benchmark's evaluation methodology.
Hardware Specification	No	The paper does not specify the hardware used for running the experiments (e.g., CPU, GPU models, memory).
Software Dependencies	No	The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions).
Experiment Setup	Yes	Specifically, we instantiate STAR by following the same paradigm as MAPPO, where agents decentralized policies undergo updates based on advantage functions derived from the state value function. Furthermore, we maximize the representation entropy (H(Xi) in Eq. (2)) by proposing a combined value function that guides agents policies, deﬁned as follows: V i = αVl(oi) + (1 α)V (xi), where V i represents the ultimate value function for agent i, and Vl(oi) denotes a local value function conditioned on the local observation oi besides V (xi). α is a diminishing factor that progressively reduces the inﬂuence of Vl(oi) on per-agent policy optimization.