reproducibilityindex.ai

Maximum Entropy Heterogeneous-Agent Reinforcement Learning

Authors: Jiarong Liu, Yifan Zhong, Siyi Hu, Haobo Fu, QIANG FU, Xiaojun Chang, Yaodong Yang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We evaluate HASAC on six benchmarks: Bi-Dex Hands, Multi-Agent Mu Jo Co, Star Craft Multi-Agent Challenge, Google Research Football, Multi-Agent Particle Environment, and Light Aircraft Game. Results show that HASAC consistently outperforms strong baselines, exhibiting better sample efficiency, robustness, and sufficient exploration.
Researcher Affiliation	Collaboration	Jiarong Liu1 , Yifan Zhong1,2 , Siyi Hu3, Haobo Fu4, Qiang Fu4, Xiaojun Chang3, Yaodong Yang1 1Institute for AI, Peking University, 2National Key Laboratory of General AI, BIGAI, 3University of Technology Sydney, 4Tencent AI Lab
Pseudocode	Yes	We refer to the above procedure as HASAC and Appendix F for its full pseudocode. [...] Algorithm 2: Heterogeneous-Agent Soft Actor-Critic
Open Source Code	No	The paper states: 'See our page at https://sites.google.com/view/meharl.' However, visiting this page reveals a 'Code' section with the text 'Coming soon', indicating the code is not yet available.
Open Datasets	Yes	We test HASAC on six benchmarks: Multi-Agent Mu Jo Co (MAMu Jo Co) (4), Bi-Dex Hands (2), Star Craft Multi-Agent Challenge (SMAC) (25), Google Research Football (GRF) (16), Multi-Agent Particle Environment (MPE) (19), and Light Aircraft Game (LAG) (23).
Dataset Splits	No	The paper does not provide specific details on how validation sets were created or used, such as percentages or sample counts for train/validation/test splits, or details of cross-validation setups. It mentions 'evaluating' performance and 'training' using random seeds, but not explicit validation splits.
Hardware Specification	No	The paper does not specify the hardware used for experiments, such as specific GPU or CPU models, memory configurations, or cloud computing instance types.
Software Dependencies	No	The paper states: 'We implement the HASAC based on the HARL framework (45) and employ the existing implementations of other algorithms'. However, it does not provide specific version numbers for any software components, libraries, or frameworks used (e.g., Python, PyTorch, TensorFlow versions).
Experiment Setup	Yes	Experimental results (see full experimental details and hyperparameter in Appendix H) show the following advantages of stochastic policies: [...] Next, we offer the hyperparameters used for HASAC in Table 4 across all environments, which are kept comparable with the HATD3 for fairness purposes.