Efficient Multi-agent Reinforcement Learning by Planning

Authors: Qihan Liu, Jianing Ye, Xiaoteng Ma, Jun Yang, Bin Liang, Chongjie Zhang

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on the SMAC benchmark demonstrate that MAZero outperforms model-free approaches in terms of sample efficiency and provides comparable or better performance than existing model-based methods in terms of both sample and computational efficiency.
Researcher Affiliation Academia Qihan Liu1 , Jianing Ye2 , Xiaoteng Ma1 , Jun Yang1 , Bin Liang1, Chongjie Zhang3 1Department of Automation, Tsinghua University 2Institute for Interdisciplinary Information Sciences, Tsinghua University 3Department of Computer Science & Engineering, Washington University in St. Louis
Pseudocode No The paper describes the MAZero algorithm in detail using text and mathematical equations but does not include formal pseudocode blocks or algorithm listings.
Open Source Code Yes Our code is available at https://github.com/liuqh16/MAZero.
Open Datasets Yes Extensive experiments on the SMAC benchmark demonstrate that MAZero outperforms model-free approaches in terms of sample efficiency and provides comparable or better performance than existing model-based methods in terms of both sample and computational efficiency. Our code is available at https://github.com/liuqh16/MAZero. The benchmark employed in this investigation is either publicly accessible or can be obtained by contacting the appropriate data providers.
Dataset Splits No The paper uses the SMAC benchmark environments and reports performance based on win rates, which is typical for reinforcement learning. However, it does not specify explicit train/validation/test data splits for a fixed dataset, as is common in supervised learning. It mentions 'evaluation' but not a separate validation set split.
Hardware Specification No The paper does not provide specific details regarding the hardware used for its experiments, such as GPU models, CPU types, or cloud computing instance specifications.
Software Dependencies No The paper mentions the use of 'Adam optimizer' and 'RMSProp optimizer' along with their learning rates, but it does not specify the versions of the software libraries (e.g., PyTorch, TensorFlow) or other ancillary software dependencies used for implementation.
Experiment Setup Yes The paper provides specific experimental setup details including network architecture (e.g., 'hidden state size of 128', 'Transformer architecture with three stacked layers'), training parameters (e.g., 'batch size of 256', 'learning rate of 10-4'), and various hyperparameters in Table 1 such as 'Discount factor 0.99', 'Number of MCTS simulations(N) 100', and 'Exponential factor in Weighted-Advantage(α) 3'.