Subspace-Aware Exploration for Sparse-Reward Multi-Agent Tasks
Authors: Pei Xu, Junge Zhang, Qiyue Yin, Chao Yu, Yaodong Yang, Kaiqi Huang
AAAI 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Under the sparse-reward setting, we show that the proposed algorithm significantly outperforms the state-of-the-art algorithms in the multiple-particle environment, the Google Research Football and Star Craft II micromanagement tasks. |
| Researcher Affiliation | Academia | 1School of Artificial Intelligence, University of Chinese Academy of Sciences 2CRISE, Institute of Automation, Chinese Academy of Sciences 3CAS, Center for Excellence in Brain Science and Intelligence Technology 4School of Computer Science and Engineering, Sun Yat-sen University 5Beijing Institute for General AI 6Institute for AI, Peking University xupei2018@ia.ac.cn, {jgzhang,qyyin,kqhuang}@nlpr.ia.ac.cn yuchao3@mail.sysu.edu.cn, yaodong.yang@pku.edu.cn |
| Pseudocode | No | The paper describes algorithms and formulations but does not include a clearly labeled 'Pseudocode' or 'Algorithm' block. |
| Open Source Code | No | The paper does not provide a link to open-source code for the methodology or explicitly state that the code is publicly available. |
| Open Datasets | Yes | We evaluate SAME on three challenging environments: a discrete version of the multiple-particle environment (MPE) (Wang et al. 2019), the Google Research Football (GRF) (Kurach et al. 2020) and Star Craft II micromanagement (SMAC) (Samvelyan et al. 2019). |
| Dataset Splits | No | The paper does not explicitly provide training/test/validation dataset splits. It mentions using specific environments for evaluation in a reinforcement learning context but no data splits in the traditional supervised learning sense. |
| Hardware Specification | No | The paper does not provide specific hardware details (e.g., CPU, GPU models, memory) used for running the experiments. It mentions using 'continuous state space' for SMAC and GRF, but no hardware specifics. |
| Software Dependencies | No | The paper mentions using RND (Burda et al. 2019b) for calculating `bfull` and states that experiments follow training settings of CDS (Chenghao et al. 2021) and use TD(λ). However, it does not provide specific version numbers for these or other software libraries/dependencies. |
| Experiment Setup | Yes | To calculate ˆbwt sub in the continuous state space (such as SMAC and GRF), we discretize each dimension of the state space into B equally spaced atomic states. ... All experiments run with five random seeds. Details for environments and training are given in Appendix. |