Subspace-Aware Exploration for Sparse-Reward Multi-Agent Tasks

Authors: Pei Xu, Junge Zhang, Qiyue Yin, Chao Yu, Yaodong Yang, Kaiqi Huang

AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Under the sparse-reward setting, we show that the proposed algorithm significantly outperforms the state-of-the-art algorithms in the multiple-particle environment, the Google Research Football and Star Craft II micromanagement tasks.
Researcher Affiliation Academia 1School of Artificial Intelligence, University of Chinese Academy of Sciences 2CRISE, Institute of Automation, Chinese Academy of Sciences 3CAS, Center for Excellence in Brain Science and Intelligence Technology 4School of Computer Science and Engineering, Sun Yat-sen University 5Beijing Institute for General AI 6Institute for AI, Peking University xupei2018@ia.ac.cn, {jgzhang,qyyin,kqhuang}@nlpr.ia.ac.cn yuchao3@mail.sysu.edu.cn, yaodong.yang@pku.edu.cn
Pseudocode No The paper describes algorithms and formulations but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block.
Open Source Code No The paper does not provide a link to open-source code for the methodology or explicitly state that the code is publicly available.
Open Datasets Yes We evaluate SAME on three challenging environments: a discrete version of the multiple-particle environment (MPE) (Wang et al. 2019), the Google Research Football (GRF) (Kurach et al. 2020) and Star Craft II micromanagement (SMAC) (Samvelyan et al. 2019).
Dataset Splits No The paper does not explicitly provide training/test/validation dataset splits. It mentions using specific environments for evaluation in a reinforcement learning context but no data splits in the traditional supervised learning sense.
Hardware Specification No The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. It mentions using 'continuous state space' for SMAC and GRF, but no hardware specifics.
Software Dependencies No The paper mentions using RND (Burda et al. 2019b) for calculating `bfull` and states that experiments follow training settings of CDS (Chenghao et al. 2021) and use TD(λ). However, it does not provide specific version numbers for these or other software libraries/dependencies.
Experiment Setup Yes To calculate ˆbwt sub in the continuous state space (such as SMAC and GRF), we discretize each dimension of the state space into B equally spaced atomic states. ... All experiments run with five random seeds. Details for environments and training are given in Appendix.