Maximum Entropy Heterogeneous-Agent Reinforcement Learning

Authors: Jiarong Liu, Yifan Zhong, Siyi Hu, Haobo Fu, QIANG FU, Xiaojun Chang, Yaodong Yang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We evaluate HASAC on six benchmarks: Bi-Dex Hands, Multi-Agent Mu Jo Co, Star Craft Multi-Agent Challenge, Google Research Football, Multi-Agent Particle Environment, and Light Aircraft Game. Results show that HASAC consistently outperforms strong baselines, exhibiting better sample efficiency, robustness, and sufficient exploration.
Researcher Affiliation Collaboration Jiarong Liu1 , Yifan Zhong1,2 , Siyi Hu3, Haobo Fu4, Qiang Fu4, Xiaojun Chang3, Yaodong Yang1 1Institute for AI, Peking University, 2National Key Laboratory of General AI, BIGAI, 3University of Technology Sydney, 4Tencent AI Lab
Pseudocode Yes We refer to the above procedure as HASAC and Appendix F for its full pseudocode. [...] Algorithm 2: Heterogeneous-Agent Soft Actor-Critic
Open Source Code No The paper states: 'See our page at https://sites.google.com/view/meharl.' However, visiting this page reveals a 'Code' section with the text 'Coming soon', indicating the code is not yet available.
Open Datasets Yes We test HASAC on six benchmarks: Multi-Agent Mu Jo Co (MAMu Jo Co) (4), Bi-Dex Hands (2), Star Craft Multi-Agent Challenge (SMAC) (25), Google Research Football (GRF) (16), Multi-Agent Particle Environment (MPE) (19), and Light Aircraft Game (LAG) (23).
Dataset Splits No The paper does not provide specific details on how validation sets were created or used, such as percentages or sample counts for train/validation/test splits, or details of cross-validation setups. It mentions 'evaluating' performance and 'training' using random seeds, but not explicit validation splits.
Hardware Specification No The paper does not specify the hardware used for experiments, such as specific GPU or CPU models, memory configurations, or cloud computing instance types.
Software Dependencies No The paper states: 'We implement the HASAC based on the HARL framework (45) and employ the existing implementations of other algorithms'. However, it does not provide specific version numbers for any software components, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup Yes Experimental results (see full experimental details and hyperparameter in Appendix H) show the following advantages of stochastic policies: [...] Next, we offer the hyperparameters used for HASAC in Table 4 across all environments, which are kept comparable with the HATD3 for fairness purposes.